How To Assign A Unique ID For Different Groups In Pandas Dataframe?
Solution 1:
sort
and find the time difference ('td'
) for successive actions. cumsum
a Boolean Series to form groups of successive actions within 30 minutes of the last. ngroup
labels the groups.
The sort_index
before the groupby can be removed if you don't care which label the groups get, but this ensures they're ordered based on the original order.
df = df.sort_values(['Name', 'Datetime'])
df['td'] = df.Datetime.diff().mask(df.Name.ne(df.Name.shift()))
# Only calculate diff within same Name
df['Id'] = (df.sort_index()
.groupby(['Name', df['td'].gt(pd.Timedelta('30min')).cumsum()], sort=False)
.ngroup()+1)
df = df.sort_index()
Output:
td
left in for clarity
Name Datetime td Id
0 Bob 2018-04-26 12:00:00 NaT 1
1 Claire 2018-04-26 12:00:00 NaT 2
2 Bob 2018-04-26 12:10:00 00:10:00 1
3 Bob 2018-04-26 12:30:00 00:20:00 1
4 Grace 2018-04-27 08:30:00 NaT 3
5 Bob 2018-04-27 09:30:00 21:00:00 4
6 Bob 2018-04-27 09:40:00 00:10:00 4
7 Bob 2018-04-27 10:00:00 00:20:00 4
8 Bob 2018-04-27 10:30:00 00:30:00 4
9 Bob 2018-04-27 11:30:00 01:00:00 5
Solution 2:
Your explanation at the near bottom is really helpful to understand it.
You need to groupby on Name
and a groupID
(don't confuse this groupID
with your final Id
) and call ngroup
to return Id
. The main thing is how to define this groupID
. To create groupID
, you need sort_values
to separate each Name
and Datetime
into ascending order. Groupby Name
and find differences in Datetime
between consecutive rows within each group of Name
(within the same Name
). Using gt
to check greater than 30mins and cumsum
to get groupID
. sort_index
to reverse back to original order and assign to s
as follows:
s = df.sort_values(['Name','Datetime']).groupby('Name').Datetime.diff() \
.gt(pd.Timedelta(minutes=30)).cumsum().sort_index()
Next, groupby Name
and s
with sort=False
to reserve the original order and call ngroup
plus 1.
df['Id'] = df.groupby(['Name', s], sort=False).ngroup().add(1)
Out[834]:
Name Datetime Id
0 Bob 2018-04-26 12:00:00 1
1 Claire 2018-04-26 12:00:00 2
2 Bob 2018-04-26 12:10:00 1
3 Bob 2018-04-26 12:30:00 1
4 Grace 2018-04-27 08:30:00 3
5 Bob 2018-04-27 09:30:00 4
6 Bob 2018-04-27 09:40:00 4
7 Bob 2018-04-27 10:00:00 4
8 Bob 2018-04-27 10:30:00 4
9 Bob 2018-04-27 11:30:00 5
Post a Comment for "How To Assign A Unique ID For Different Groups In Pandas Dataframe?"