Best Way To Set A Multiindex On A Pandas Dataframe
Solution 1:
Indexing in pandas is easier than this. You do not need to create your own instance of the MultiIndex class.
The pandas DataFrame has a method called .set_index()
which takes either a single column as argument or a list of columns. Supplying a list of columns will set a multiindex for you.
Like this:
df.set_index(['Group', 'Year', 'Gender'], inplace=True)
Note the inplace=True
, which I can recommend highly.
When you are dealing with huge dataframes that barely fit in memory, inplace operations will litterally half your memory usage.
Consider this:
df2 = df1.set_index('column') # Don'tdo this
del df1 # Don'tdo this
When this operation is done, the memory usage will be about the same as before. But only because we do del df1
. In the time between these two commands, there will be two copies of the same dataframe, therefore, double memory.
Doing this is implicitly the same:
df1 = df1.set_index('column') # Don't do this either
And will still take double memory of doing this inplace.
Post a Comment for "Best Way To Set A Multiindex On A Pandas Dataframe"