Skip to content Skip to sidebar Skip to footer

Make Pandas Groupby Act Similarly To Itertools Groupby

Suppose I have a Python dict of lists like so: {'Grp': ['2' , '6' , '6' , '5' , '5' , '6' , '6' , '7' , '7' , '6'], 'Nums': ['6.20', '6.30', '6.80', '6.45', '6.5

Solution 1:

First you can identify which elements in the Grp column differ from the previous and get the cumulative sum to form the groups you need:

In [9]:
    diff_to_previous = df.Grp != df.Grp.shift(1)
    diff_to_previous.cumsum()
Out[9]:

01122233435464758596

So you can then do

df.groupby(diff_to_previous.cumsum()) 

to get the desired groupby object

Solution 2:

Well, not to be cheeky, but why not just use Python's groupby on the DataFrame by using iterrows? That is what it is there for:

>>>df
  Grp  Nums
0   2  6.20
1   6  6.30
2   6  6.80
3   5  6.45
4   5  6.55
5   6  6.35
6   6  6.37
7   7  6.36
8   7  6.78
9   6  6.33

>>>from itertools import groupby>>>for k, l in groupby(df.iterrows(), key=lambda row: row[1]['Grp']):
        print k, [t[1]['Nums'] for t in l]

Prints:

2['6.20']6['6.30', '6.80']5['6.45', '6.55']6['6.35', '6.37']7['6.36', '6.78']6['6.33']

To try and make Panda's groupby act in the way you want is probably asking for so many stacked methods that you won't be able to follow it when you reread in the future.

Solution 3:

You basically want to create a new column to index your desired grouping order, and then use that for grouping. You keep the index number the same until the value in Grp changes.

For your data, you would want something like this:

   Grp  Nums new_group
0    2  6.20         1
1    6  6.30         2
2    6  6.80         2
3    5  6.45         3
4    5  6.55         3
5    6  6.35         4
6    6  6.37         4
7    7  6.36         5
8    7  6.78         5
9    6  6.33         6

Where you can now group on both new group and Grp:

df.groupby(['new_group', 'Grp']).Nums.groups
{(1, 2): [0],
 (2, 6): [1, 2],
 (3, 5): [3, 4],
 (4, 6): [5, 6],
 (5, 7): [7, 8],
 (6, 6): [9]

I used this method to create the new column:

df['new_group'] = Nonefor n, grp inenumerate(df.Grp):
if n is0:
    df.new_group.iat[0] = 1elif grp == df.Grp.iat[n - 1]:
    df.new_group.iat[n] = df.new_group.iat[n - 1]
else:
    df.new_group.iat[n] = df.new_group.iat[n - 1] + 1

Note that this answer here has the same idea (thanks @ajcr for the link), but in a much more succinct representation:

>>>df.groupby((df.Grp!=df.Grp.shift()).cumsum()).Nums.groups
{1: [0], 2: [1, 2], 3: [3, 4], 4: [5, 6], 5: [7, 8], 6: [9]

Post a Comment for "Make Pandas Groupby Act Similarly To Itertools Groupby"