Groupby Sum, Index Vs. Column Results
For the following dataframe: df = pd.DataFrame({'group':['a','a','b','b'], 'data':[5,10,100,30]},columns=['group', 'data']) print(df) group data 0 a 5 1 a 10 2
Solution 1:
Better here is use GroupBy.transform
for return Series with same size like original DataFrame
, so after assign all working correctly:
df['new'] = df.groupby('group')['data'].transform('sum')
Because if assign new Series values are align by index values. If index is different, get NaN
s:
print (df.groupby('group')['data'].sum())
group
a 15
b 130
Name: data, dtype: int64
Different index values - get NaNs:
print (df.groupby('group')['data'].sum().index)
Index(['a', 'b'], dtype='object', name='group')
print (df.index)
RangeIndex(start=0, stop=4, step=1)
df.set_index('group', inplace=True)
print (df.groupby('group')['data'].sum())
group
a 15
b 130
Name: data, dtype: int64
Index can align, because values matched:
print (df.groupby('group')['data'].sum().index)
Index(['a', 'b'], dtype='object', name='group')
print (df.index)
Index(['a', 'a', 'b', 'b'], dtype='object', name='group')
Solution 2:
You're not getting what you want because when using df.groupby('group')['data'].sum()
, this is returning an aggregated result with group
as index:
group
a 15
b 130
Name: data, dtype: int64
Where clearly indexes are not aligned.
If you want this to work you'll have to use transform
, which returns a Series with the transformed vales which has the same axis length as self:
df['new'] = df.groupby('group')['data'].transform('sum')
group data new
0 a 5 15
1 a 10 15
2 b 100 130
3 b 30 130
Post a Comment for "Groupby Sum, Index Vs. Column Results"