Skip to content Skip to sidebar Skip to footer

Nan In Pandas Series.tolist() Behaves Differently From Nan In List

Why does >> import pandas as pd >> import numpy as np >> list(pd.Series([np.nan, np.nan, 2, np.nan, 2])) == [np.nan, np.nan, 2, np.nan, 2] return False? I get t

Solution 1:

The root issue, as @emilaz already stated, is that nan != nan in all cases. However, the object reference is what matters in your observation.

Observe the following object references between list and pd.Series:

>>>s = pd.Series([np.nan, np.nan, np.nan, 2, 2, 1, 5])>>>s.apply(id)
0    149706480
1    202463472
2    202462336
3    149706912
4    149706288
5    149708784
6    149707200
dtype: int64

>>>l = [np.nan, np.nan, np.nan, 2, 2, 1, 5]>>>list(map(id, l))
[68634768, 68634768, 68634768, 1389126848, 1389126848, 1389126832, 1389126896]

The np.nan object shares the same reference as the imported np.nan object in list, whereas a new reference is created for each Series (which makes sense for pandas usage).

The answer therefore is not to compare nan in such fashion. pandas have its own ways to deal with nan, so depending on your actual activity, there may be a much simpler answer (e.g. df.groupby('some col').count()) than you envisioned.

Solution 2:

In python, equating to nan always returns False. So the following behavior is expected:

import numpy as np
np.nan == np.nan
>>>> False

Which is why your list comparisons return False.

A possible workaround would be this:

import pandas as pd
import numpy as np

foo= list(pd.Series([np.nan, np.nan, 2, np.nan, 2]))
bar= [np.nan, np.nan, 2, np.nan, 2]

np.allclose(foo,bar, equal_nan=True)
>>>> True

This might interest you: comparing numpy arrays containing NaN.

For finding the most common element, I'd suggest using pandas and the value_counts() method:

pd.Series([np.nan, np.nan, 2, np.nan, 2]).value_counts()
>>>> 2.02

If you care about nan counts, you can simply pass dropna=False to the method:

pd.Series([np.nan, np.nan, 2, np.nan, 2]).value_counts()
>>>> NaN  32.02

Post a Comment for "Nan In Pandas Series.tolist() Behaves Differently From Nan In List"