Skip to content Skip to sidebar Skip to footer

Pandas Dataframe Groupby - Displaying Group Statistics

For the Pandas dataframe: import pandas as pd codes = ['one','two','three']; colours = ['black', 'white']; textures = ['soft', 'hard']; N= 100 # length of the dataframe df = pd.Dat

Solution 1:

You can pass a list of functions to be applied to the group, e.g.:

grouped = df.groupby(['code', 'colour'])['size'].agg([np.sum, np.average, np.size, np.argmax]).reset_index()

Since argmax is the index of the maximum row, you will need to look them up on the original dataframe:

grouped['max_row_id'] = df.ix[grouped['argmax']].reset_index(grouped.index).id

NOTE: I selected the 'size' column because all the functions apply to that column. If you wanted to do a different set of functions for different columns, you can use agg with a dictionary with a list of functions e.g. agg({'size': [np.sum, np.average]}). This results in MultiIndex columns, which means that when getting the IDs for the maximum size in each group you need to do:

grouped['max_row_id'] = df.ix[grouped['size']['argmax']].reset_index(grouped.index).id

Post a Comment for "Pandas Dataframe Groupby - Displaying Group Statistics"