Skip to content Skip to sidebar Skip to footer

Converting Pandas Dataframe Types

I have a pandas dataFrame created through a mysql call which returns the data as object type. The data is mostly numeric, with some 'na' values. How can I cast the type of the data

Solution 1:

Use the replace method on dataframes:

import numpy as np
df = DataFrame({
'k1': ['na'] * 3 + ['two'] * 4,
'k2': [1, 'na', 2, 'na', 3, 4, 4]})

print df

df = df.replace('na', np.nan)

print df

I think it's helpful to point out that df.replace('na', np.nan) by itself won't work. You must assign it back to the existing dataframe.


Solution 2:

df = df.convert_objects(convert_numeric=True) will work in most cases.

I should note that this copies the data. It would be preferable to get it to a numeric type on the initial read. If you post your code and a small example, someone might be able to help you with that.


Solution 3:

This is what Tom suggested and is correct

In [134]: s = pd.Series(['1','2.','na'])

In [135]: s.convert_objects(convert_numeric=True)
Out[135]: 
0     1
1     2
2   NaN
dtype: float64

As Andy points out, this doesn't work directly (I think that's a bug), so convert to all string elements first, then convert

In [136]: s2 = pd.Series(['1','2.','na',5])

In [138]: s2.astype(str).convert_objects(convert_numeric=True)
Out[138]: 
0     1
1     2
2   NaN
3     5
dtype: float64

Post a Comment for "Converting Pandas Dataframe Types"