Skip to content Skip to sidebar Skip to footer

How To Convert A Pandas Dataframe Into A Numpy Array With The Column Names

This must use vectorized methods, nothing iterative I would like to create a numpy array from pandas dataframe. My code: import pandas as pd _df = pd.DataFrame({'itme': ['book',

Solution 1:

  • do a quick search for a val by their "item" and "color" with one of the following options:
    1. Use pandas Boolean indexing
    2. Convert the dataframe into a numpy.recarry using pandas.DataFrame.to_records, and also use Boolean indexing
  • .item is a method for both pandas and numpy, so don't use 'item' as a column name. It has been changed to '_item'.
  • As an FYI, numpy is a pandas dependency, and much of pandas vectorized functionality directly corresponds to numpy.
import pandas as pd
import numpy as np

# test data
df = pd.DataFrame({'_item': ['book', 'book' , 'car', 'car', 'bike', 'bike'], 'color': ['green', 'blue' , 'red', 'green' , 'blue', 'red'], 'val' : [-22.7, -109.6, -57.19, -11.2, -25.6, -33.61]})

# Use pandas Boolean index to
selected = df[(df._item == 'book') & (df.color == 'blue')]

# print(selected)
_item color    val
 book  blue -109.6

# Alternatively, create a recarray
v = df.to_records(index=False)

# display(v)
rec.array([('book', 'green',  -22.7 ), ('book', 'blue', -109.6 ),
           ('car', 'red',  -57.19), ('car', 'green',  -11.2 ),
           ('bike', 'blue',  -25.6 ), ('bike', 'red',  -33.61)],
          dtype=[('_item', 'O'), ('color', 'O'), ('val', '<f8')])

# search the recarray
selected = v[(v._item == 'book') & (v.color == 'blue')]

# print(selected)
[('book', 'blue', -109.6)]

Update in response to OP edit

  • You must first reshape the dataframe using pandas.DataFrame.pivot, and then use the previously mentioned methods.
dfp = df.pivot(index='_item', columns='color', values='val')

# display(dfp)
color   blue  green    red
_item                     
bike   -25.6    NaN -33.61
book  -109.6  -22.7    NaN
car      NaN  -11.2 -57.19

# create a numpy recarray
v = dfp.to_records(index=True)

# display(v)
rec.array([('bike',  -25.6,   nan, -33.61),
           ('book', -109.6, -22.7,    nan),
           ('car',    nan, -11.2, -57.19)],
          dtype=[('_item', 'O'), ('blue', '<f8'), ('green', '<f8'), ('red', '<f8')])

# select data
selected = v.blue[(v._item == 'book')]

# print(selected)
array([-109.6])

Post a Comment for "How To Convert A Pandas Dataframe Into A Numpy Array With The Column Names"