Skip to content Skip to sidebar Skip to footer

Converting Pandas Dataframe Types

I have a pandas dataFrame created through a mysql call which returns the data as object type. The data is mostly numeric, with some 'na' values. How can I cast the type of the data

Solution 1:

Use the replace method on dataframes:

import numpy as np
df = DataFrame({
'k1': ['na'] * 3 + ['two'] * 4,
'k2': [1, 'na', 2, 'na', 3, 4, 4]})

printdfdf = df.replace('na', np.nan)

printdf

I think it's helpful to point out that df.replace('na', np.nan) by itself won't work. You must assign it back to the existing dataframe.

Solution 2:

df = df.convert_objects(convert_numeric=True) will work in most cases.

I should note that this copies the data. It would be preferable to get it to a numeric type on the initial read. If you post your code and a small example, someone might be able to help you with that.

Solution 3:

This is what Tom suggested and is correct

In [134]: s = pd.Series(['1','2.','na'])

In [135]: s.convert_objects(convert_numeric=True)
Out[135]: 
01122   NaN
dtype: float64

As Andy points out, this doesn't work directly (I think that's a bug), so convert to all string elements first, then convert

In [136]: s2 = pd.Series(['1','2.','na',5])

In [138]: s2.astype(str).convert_objects(convert_numeric=True)
Out[138]: 
01122   NaN
35
dtype: float64

Post a Comment for "Converting Pandas Dataframe Types"