Find Top 3 In Columns Of A Dataframe Using Pandas
I have a time series dataset which looks like this: Date Newspaper City1 City2 Region1Total City3 City4 Region2Total 2017-12-01 NewsPaper1 231563 8696 240
Solution 1:
First of all, you need to get a dataframe where only Newspapers are listed, not total.
dff = df.loc[df['Newspaper']!='Total']
Then for city1
, you can do:
dff[['Newspaper', 'City1']].sort_values(['City1'], ascending=False).head(3)
Output:
Newspaper City1
0 NewsPaper1 231563
1 NewsPaper2 173009
5 NewsPaper6 137650
Similarly, you can achieve results for all the columns of interest.
Solution 2:
importpandasaspd# Setup the datadata=pd.DataFrame({'Date': {0:'2017-12-01',
1:'2017-12-01',
2:'2017-12-01',
3:'2017-12-01',
4:'2017-12-01',
5:'2017-12-01'},'Newspaper': {0:'NewsPaper1',
1:'NewsPaper2',
2:'NewsPaper3',
3:'NewsPaper4',
4:'NewsPaper5',
5:'NewsPaper6'},'City1': {0:231563, 1:173009, 2:40511, 3:37770, 4:5176, 5:137650},'City2': {0:8696, 1:12180, 2:4600, 3:2980, 4:900, 5:8025},'Region1Total': {0:240259,
1:185189,
2:45111,
3:40750,
4:6076,
5:145675},'City3': {0:21072, 1:28910, 2:5040, 3:6520, 4:1790, 5:25300},'City4': {0:8998, 1:5550, 2:3330, 3:1880, 4:5000, 5:11000},'Region2Total': {0:30070, 1:34460, 2:8370, 3:8400, 4:6790, 5:36300}})# Not all columns are required, only the Newspaper and any 'City' columncleaned_data=data[[iforiindata.columnsif'City'ini]+ ['Newspaper']]# Change the structuredf=cleaned_data.set_index('Newspaper').unstack()# Get the top 3 values for each citydf=df.groupby(level=0).apply(lambdadf:df.sort_values(ascending=False)[:4])df.index=df.index.droplevel(0)dfOut[]:NewspaperCity1NewsPaper1231563NewsPaper2173009NewsPaper6137650NewsPaper340511City2NewsPaper212180NewsPaper18696NewsPaper68025NewsPaper34600City3NewsPaper228910NewsPaper625300NewsPaper121072NewsPaper46520City4NewsPaper611000NewsPaper18998NewsPaper25550NewsPaper55000
Post a Comment for "Find Top 3 In Columns Of A Dataframe Using Pandas"