Pandas Function Operations
Solution 1:
Here is the answer that worked for me:
defanswer_five():
return census_df.groupby(["STNAME"],sort=False).sum()["COUNTY"].idxmax()
First part created aggregated df
census_df.groupby(["STNAME"],sort=False).sum()
Second part takes the col you need
["COUNTY"].idxmax()
and returns value corresponding to index with max, check here
Solution 2:
Just a correction to your entire code.
First, according to the source, SUMLEV
of 50 means the row is a county. Two ways to answer this.
Thought process (think of it like in Excel):
You want to count the number of "county rows" in each state group.
First, you create the mask/condition to select all SUMLEV == 50
("county rows").
Then group them by STNAME
.
Then use .size()
to count the number of rows in each grouping.
# this is it!defanswer_five():
mask = (census_df.SUMLEV == 50)
max_index = census_df[mask].groupby('STNAME').size().idxmax()
return max_index
# not so elegantdefanswer_five():
census_df['Counts'] = 1
mask = (census_df.SUMLEV == 50)
max_index = census_df[mask].groupby('STNAME')['Counts'].sum().idxmax()
return max_index
You are welcome. https://pandas.pydata.org/pandas-docs/stable/generated/pandas.core.groupby.GroupBy.size.html
Solution 3:
Actually you can just count the number in states level instead of looking into County details.
And this should work:
census_df[census_df['SUMLEV']==50].groupby(['STNAME']).size().idxmax()
Solution 4:
def answer_five():
new_df = census_df[census_df['SUMLEV'] == 50]
x = new_df.groupby('STNAME')
return x.count()['COUNTY'].idxmax()
answer_five()
Solution 5:
It's the change from .max()
to idxmax()
that returns the correct value for the STNAME
rather than a large integer.
Post a Comment for "Pandas Function Operations"