Skip to content Skip to sidebar Skip to footer

Finding Rows With The Maximum Values Within A Group

I have this dataframe which entries are: In [77]: df.loc[1] Out[77]: img 410T1_B03_S06_W2_cell1_ann.tif immean 1302 imvar

Solution 1:

Ok, your code is very dirty and I think you have over-engineered your solution, so I will simply give you an example of how I would do this conceptually, using cleaner example code.

My example dataframe:

ab  c  othervalue
01a  z         10011b  x         10121  c  y         10232  d  v         10342  e  u         10452  f  t         105

Using the argmax, we can get the index of the value which is highest in the group.

df.groupby('a').agg({'othervalue':pd.Series.argmax})
   othervalue
a            
1225

Now we can use that value inside the .loc method to get the whole rows from the original dataframe.

max_scores = df.groupby('a').agg({'othervalue':pd.Series.argmax})['othervalue']
df.loc[max_scores]
   a  b  c  othervalue
21  c  y         10252  f  t         105

Multiple rows with maximum value (Question extension)

If you have multiple rows matching the maximum value, you will have to do something a bit different, and one more step.

ab  c  othervalue
01a  z         10011b  x         10121  c  y         10232  d  v         10342  e  u         10452  f  t         10561a  z         10071b  x         10181  c  y         10292  d  v         103102  e  u         104112  f  t         105

With the above example, first we get the maximum values in each group, and reset the index so we can use it for the coming merge.

maxvalues_per_group = df.groupby('a').agg({'othervalue':pd.np.max})
maxvalues_per_group.reset_index(inplace=True)

With these values, we merge on the original dataframe again to get all rows that matches the maximum values in each group.

df.merge(on=['a', 'othervalue'], right=maxvalues_per_group, how='inner')

   abcothervalue01cy10211cy10222ft10532ft105

Post a Comment for "Finding Rows With The Maximum Values Within A Group"