Pandas Dataframe Groupby - Displaying Group Statistics
For the Pandas dataframe: import pandas as pd codes = ['one','two','three']; colours = ['black', 'white']; textures = ['soft', 'hard']; N= 100 # length of the dataframe df = pd.Dat
Solution 1:
You can pass a list of functions to be applied to the group, e.g.:
grouped = df.groupby(['code', 'colour'])['size'].agg([np.sum, np.average, np.size, np.argmax]).reset_index()
Since argmax
is the index of the maximum row, you will need to look them up on the original dataframe:
grouped['max_row_id'] = df.ix[grouped['argmax']].reset_index(grouped.index).id
NOTE: I selected the 'size' column because all the functions apply to that column. If you wanted to do a different set of functions for different columns, you can use agg
with a dictionary with a list of functions e.g. agg({'size': [np.sum, np.average]})
. This results in MultiIndex
columns, which means that when getting the IDs for the maximum size in each group you need to do:
grouped['max_row_id'] = df.ix[grouped['size']['argmax']].reset_index(grouped.index).id
Post a Comment for "Pandas Dataframe Groupby - Displaying Group Statistics"