Pandas Function Operations

June 22, 2024 Post a Comment

Data is from the United States Census Bureau. Counties are political and geographic subdivisions of states in the United States. This dataset contains population data for counties

Solution 1:

Here is the answer that worked for me:

defanswer_five():
    return census_df.groupby(["STNAME"],sort=False).sum()["COUNTY"].idxmax()

First part created aggregated df

census_df.groupby(["STNAME"],sort=False).sum()

Second part takes the col you need

["COUNTY"].idxmax()

and returns value corresponding to index with max, check here

Solution 2:

Just a correction to your entire code.

First, according to the source, SUMLEV of 50 means the row is a county. Two ways to answer this.

Thought process (think of it like in Excel): You want to count the number of "county rows" in each state group. First, you create the mask/condition to select all SUMLEV == 50 ("county rows"). Then group them by STNAME. Then use .size() to count the number of rows in each grouping.

Baca Juga

# this is it!defanswer_five():
    mask = (census_df.SUMLEV == 50)
    max_index = census_df[mask].groupby('STNAME').size().idxmax()
    return max_index

# not so elegantdefanswer_five():
    census_df['Counts'] = 1
    mask = (census_df.SUMLEV == 50)
    max_index = census_df[mask].groupby('STNAME')['Counts'].sum().idxmax()
    return max_index

You are welcome. https://pandas.pydata.org/pandas-docs/stable/generated/pandas.core.groupby.GroupBy.size.html

Solution 3:

Actually you can just count the number in states level instead of looking into County details.

And this should work:

census_df[census_df['SUMLEV']==50].groupby(['STNAME']).size().idxmax()

Solution 4:

def answer_five():
    new_df = census_df[census_df['SUMLEV'] == 50]
    x = new_df.groupby('STNAME')
    return x.count()['COUNTY'].idxmax()


answer_five()

Solution 5:

It's the change from .max() to idxmax() that returns the correct value for the STNAME rather than a large integer.

Getting Started with Python