Python Pandas Unique Value Ignoring NaN
I want to use unique in groupby aggregation, but I don't want nan in the unique result. An example dataframe: df = pd.DataFrame({'a': [1, 2, 1, 1, pd.np.nan, 3, 3], 'b': [0,0,1,1,1
Solution 1:
Define a function:
def unique_non_null(s):
return s.dropna().unique()
Then use it in the aggregation:
df.groupby('b').agg({
'a': ['min', 'max', unique_non_null],
'c': ['first', 'last', unique_non_null]
})
Solution 2:
This will work for what you need:
df.fillna(method='ffill').groupby('b').agg({'a': ['min', 'max', 'unique'], 'c': ['first', 'last', 'unique']})
Because you use min
, max
and unique
repeated values do not concern you.
Solution 3:
Update 23 November 2020
This answer is terrible, don't use this. Please refer @IanS's answer.
Earlier
Try ffill
df.ffill().groupby('b').agg({'a': ['min', 'max', 'unique'], 'c': ['first', 'last', 'unique']})
c a first last unique min max unique b 0 foo foo [foo] 1.0 2.0 [1.0, 2.0] 1 bar bar [bar, foo, baz] 1.0 3.0 [1.0, 3.0]
If Nan is the first element of the group then the above solution breaks.
Solution 4:
You can use the below code,
df.apply(lambda x: len(x.dropna().unique()))
Post a Comment for "Python Pandas Unique Value Ignoring NaN"