Best Way To Set A Multiindex On A Pandas Dataframe
Solution 1:
Indexing in pandas is easier than this. You do not need to create your own instance of the MultiIndex class.
The pandas DataFrame has a method called .set_index() which takes either a single column as argument or a list of columns. Supplying a list of columns will set a multiindex for you.
Like this:
df.set_index(['Group', 'Year', 'Gender'], inplace=True)
Note the inplace=True, which I can recommend highly.
When you are dealing with huge dataframes that barely fit in memory, inplace operations will litterally half your memory usage.
Consider this:
df2 = df1.set_index('column') # Don'tdo this
del df1 # Don'tdo this
When this operation is done, the memory usage will be about the same as before. But only because we do del df1. In the time between these two commands, there will be two copies of the same dataframe, therefore, double memory.
Doing this is implicitly the same:
df1 = df1.set_index('column') # Don't do this eitherAnd will still take double memory of doing this inplace.
Post a Comment for "Best Way To Set A Multiindex On A Pandas Dataframe"