Skip to content Skip to sidebar Skip to footer

Groupby Sum, Index Vs. Column Results

For the following dataframe: df = pd.DataFrame({'group':['a','a','b','b'], 'data':[5,10,100,30]},columns=['group', 'data']) print(df) group data 0 a 5 1 a 10 2

Solution 1:

Better here is use GroupBy.transform for return Series with same size like original DataFrame, so after assign all working correctly:

df['new'] = df.groupby('group')['data'].transform('sum')

Because if assign new Series values are align by index values. If index is different, get NaNs:

print (df.groupby('group')['data'].sum())
group
a     15
b    130
Name: data, dtype: int64

Different index values - get NaNs:

print (df.groupby('group')['data'].sum().index)
Index(['a', 'b'], dtype='object', name='group')

print (df.index)
RangeIndex(start=0, stop=4, step=1)

df.set_index('group', inplace=True)

print (df.groupby('group')['data'].sum())
group
a     15
b    130
Name: data, dtype: int64

Index can align, because values matched:

print (df.groupby('group')['data'].sum().index)
Index(['a', 'b'], dtype='object', name='group')

print (df.index)
Index(['a', 'a', 'b', 'b'], dtype='object', name='group')

Solution 2:

You're not getting what you want because when using df.groupby('group')['data'].sum(), this is returning an aggregated result with group as index:

group
a     15
b    130
Name: data, dtype: int64

Where clearly indexes are not aligned.

If you want this to work you'll have to use transform, which returns a Series with the transformed vales which has the same axis length as self:

df['new'] = df.groupby('group')['data'].transform('sum')

   group  data  new
0     a     5   15
1     a    10   15
2     b   100  130
3     b    30  130

Post a Comment for "Groupby Sum, Index Vs. Column Results"