Groupby Sum, Index Vs. Column Results
For the following dataframe:  df = pd.DataFrame({'group':['a','a','b','b'], 'data':[5,10,100,30]},columns=['group', 'data'])  print(df)    group  data 0     a     5 1     a    10 2
Solution 1:
Better here is use GroupBy.transform for return Series with same size like original DataFrame, so after assign all working correctly:
df['new'] = df.groupby('group')['data'].transform('sum')
Because if assign new Series values are align by index values. If index is different, get NaNs:
print (df.groupby('group')['data'].sum())
group
a     15
b    130
Name: data, dtype: int64
Different index values - get NaNs:
print (df.groupby('group')['data'].sum().index)
Index(['a', 'b'], dtype='object', name='group')
print (df.index)
RangeIndex(start=0, stop=4, step=1)
df.set_index('group', inplace=True)
print (df.groupby('group')['data'].sum())
group
a     15
b    130
Name: data, dtype: int64
Index can align, because values matched:
print (df.groupby('group')['data'].sum().index)
Index(['a', 'b'], dtype='object', name='group')
print (df.index)
Index(['a', 'a', 'b', 'b'], dtype='object', name='group')
Solution 2:
You're not getting what you want because when using df.groupby('group')['data'].sum(), this is returning an aggregated result with group as index:
group
a     15
b    130
Name: data, dtype: int64
Where clearly indexes are not aligned.
If you want this to work you'll have to use transform, which returns a Series with the transformed vales which has the same axis length as self:
df['new'] = df.groupby('group')['data'].transform('sum')
   group  data  new
0     a     5   15
1     a    10   15
2     b   100  130
3     b    30  130
Post a Comment for "Groupby Sum, Index Vs. Column Results"