Nan In Pandas Series.tolist() Behaves Differently From Nan In List
Solution 1:
The root issue, as @emilaz already stated, is that nan != nan
in all cases. However, the object reference is what matters in your observation.
Observe the following object references between list
and pd.Series
:
>>>s = pd.Series([np.nan, np.nan, np.nan, 2, 2, 1, 5])>>>s.apply(id)
0 149706480
1 202463472
2 202462336
3 149706912
4 149706288
5 149708784
6 149707200
dtype: int64
>>>l = [np.nan, np.nan, np.nan, 2, 2, 1, 5]>>>list(map(id, l))
[68634768, 68634768, 68634768, 1389126848, 1389126848, 1389126832, 1389126896]
The np.nan
object shares the same reference as the imported np.nan
object in list
, whereas a new reference is created for each Series
(which makes sense for pandas
usage).
The answer therefore is not to compare nan
in such fashion. pandas
have its own ways to deal with nan
, so depending on your actual activity, there may be a much simpler answer (e.g. df.groupby('some col').count()
) than you envisioned.
Solution 2:
In python, equating to nan always returns False. So the following behavior is expected:
import numpy as np
np.nan == np.nan
>>>> False
Which is why your list comparisons return False.
A possible workaround would be this:
import pandas as pd
import numpy as np
foo= list(pd.Series([np.nan, np.nan, 2, np.nan, 2]))
bar= [np.nan, np.nan, 2, np.nan, 2]
np.allclose(foo,bar, equal_nan=True)
>>>> True
This might interest you: comparing numpy arrays containing NaN.
For finding the most common element, I'd suggest using pandas and the value_counts()
method:
pd.Series([np.nan, np.nan, 2, np.nan, 2]).value_counts()
>>>> 2.02
If you care about nan counts, you can simply pass dropna=False
to the method:
pd.Series([np.nan, np.nan, 2, np.nan, 2]).value_counts()
>>>> NaN 32.02
Post a Comment for "Nan In Pandas Series.tolist() Behaves Differently From Nan In List"