Python Pandas Unique Value Ignoring NaN

July 31, 2022 Post a Comment

I want to use unique in groupby aggregation, but I don't want nan in the unique result. An example dataframe: df = pd.DataFrame({'a': [1, 2, 1, 1, pd.np.nan, 3, 3], 'b': [0,0,1,1,1

Solution 1:

Define a function:

def unique_non_null(s):
    return s.dropna().unique()

Then use it in the aggregation:

df.groupby('b').agg({
    'a': ['min', 'max', unique_non_null], 
    'c': ['first', 'last', unique_non_null]
})

Solution 2:

This will work for what you need:

df.fillna(method='ffill').groupby('b').agg({'a': ['min', 'max', 'unique'], 'c': ['first', 'last', 'unique']})

Because you use min, max and unique repeated values do not concern you.

Solution 3:

Update 23 November 2020

This answer is terrible, don't use this. Please refer @IanS's answer.

Earlier

Try ffill

df.ffill().groupby('b').agg({'a': ['min', 'max', 'unique'], 'c': ['first', 'last', 'unique']})

      c                          a                 
  first last           unique  min  max      unique
b                                                  
0   foo  foo            [foo]  1.0  2.0  [1.0, 2.0]
1   bar  bar  [bar, foo, baz]  1.0  3.0  [1.0, 3.0]

If Nan is the first element of the group then the above solution breaks.

Solution 4:

You can use the below code,

    df.apply(lambda x: len(x.dropna().unique()))

Getting Started with Python