Skip to content Skip to sidebar Skip to footer

Best Way To Set A Multiindex On A Pandas Dataframe

I have a Dataframe df with these columns: Group Year Gender Feature_1 Feature_2 Feature_3 ... I want to use MultiIndex to stack the data later, and I tried this way: df.index = pd

Solution 1:

Indexing in pandas is easier than this. You do not need to create your own instance of the MultiIndex class.

The pandas DataFrame has a method called .set_index() which takes either a single column as argument or a list of columns. Supplying a list of columns will set a multiindex for you.

Like this:

df.set_index(['Group', 'Year', 'Gender'], inplace=True)

Note the inplace=True, which I can recommend highly.

When you are dealing with huge dataframes that barely fit in memory, inplace operations will litterally half your memory usage.

Consider this:

df2 = df1.set_index('column') # Don'tdo this
del df1 # Don'tdo this

When this operation is done, the memory usage will be about the same as before. But only because we do del df1. In the time between these two commands, there will be two copies of the same dataframe, therefore, double memory.

Doing this is implicitly the same:

df1 = df1.set_index('column') # Don't do this either

And will still take double memory of doing this inplace.

Post a Comment for "Best Way To Set A Multiindex On A Pandas Dataframe"