In Pandas, How Do I Flatten A Group Of Rows
I am new to pandas in python and I would be grateful for any help on this. I have been googling and googling but can't seem to crack it. For example, I have a csv file with 6 colum
Solution 1:
Steps:
1) Compute the cumulative counts for the Groupby object. Add 1 so that the headers are formatted as per the desired DF
.
2) Set the same grouped columns as the index axis along with the computed cumcounts
and then unstack
it. Additionally, sort the header according to the lowermost level.
3) Rename the multi-index columns and flatten accordingly to obtain a single header.
cc = df.groupby(['event','event_date','event_time']).cumcount() + 1
df = df.set_index(['event','event_date','event_time', cc]).unstack().sort_index(1, level=1)
df.columns = ['_'.join(map(str,i)) for i in df.columns]
df.reset_index()
Solution 2:
You making a wide table from a long one. Usually in a data analysis you would like to do the opposite. Here is a method that first counts the occurrences of each variable name, height and age and then pivots them the way you want.
df['group_num'] = df.groupby(['event', 'event_date','event_time']).cumcount() + 1
df = df.sort_values('group_num')
df1 = df.set_index(['event', 'event_date','event_time', 'group_num']).stack().reset_index()
df1['var_names'] = df1['level_4'] + '_' + df1['group_num'].astype(str)
df1 = df1.drop(['group_num', 'level_4'], axis=1)
df1.set_index(['event', 'event_date', 'event_time', 'var_names']).squeeze().unstack('var_names')
var_names age_1 age_2 age_3 height_1 height_2 height_3 \
event event_date event_time
1 2015-05-06 14:00 24 55 22 185 176 193
2 2015-05-14 17:00 72 42 None 178 184 None
var_names name_1 name_2 name_3
event event_date event_time
1 2015-05-06 14:00 J Bloggs P Smith T Kirk
2 2015-05-14 17:00 B Gates J Mayer None
Post a Comment for "In Pandas, How Do I Flatten A Group Of Rows"