Finding Rows With The Maximum Values Within A Group
Solution 1:
Ok, your code is very dirty and I think you have over-engineered your solution, so I will simply give you an example of how I would do this conceptually, using cleaner example code.
My example dataframe:
ab c othervalue
01a z 10011b x 10121 c y 10232 d v 10342 e u 10452 f t 105
Using the argmax
, we can get the index of the value which is highest in the group.
df.groupby('a').agg({'othervalue':pd.Series.argmax})
othervalue
a
1225
Now we can use that value inside the .loc
method to get the whole rows from the original dataframe.
max_scores = df.groupby('a').agg({'othervalue':pd.Series.argmax})['othervalue']
df.loc[max_scores]
a b c othervalue
21 c y 10252 f t 105
Multiple rows with maximum value (Question extension)
If you have multiple rows matching the maximum value, you will have to do something a bit different, and one more step.
ab c othervalue
01a z 10011b x 10121 c y 10232 d v 10342 e u 10452 f t 10561a z 10071b x 10181 c y 10292 d v 103102 e u 104112 f t 105
With the above example, first we get the maximum values in each group, and reset the index so we can use it for the coming merge.
maxvalues_per_group = df.groupby('a').agg({'othervalue':pd.np.max})
maxvalues_per_group.reset_index(inplace=True)
With these values, we merge on the original dataframe again to get all rows that matches the maximum values in each group.
df.merge(on=['a', 'othervalue'], right=maxvalues_per_group, how='inner')
abcothervalue01cy10211cy10222ft10532ft105
Post a Comment for "Finding Rows With The Maximum Values Within A Group"