Skip to content Skip to sidebar Skip to footer

Python Pandas: Assign Control Vs. Treatment Groupings Randomly Based On %

I am working on an experiment design, where I need to split a dataframe df into a control and treatment group by % by pre-existing groupings. This is the dataframe df: df.head()

Solution 1:

we can use numpy.random.choice() method:

In [160]: df['Flag'] = \
     ...: df.groupby('Group')['customer_id']\
     ...:   .transform(lambda x: np.random.choice(['Control','Test'], len(x), 
                                                  p=[.5,.5] if x.name==1 else [.4,.6]))
     ...:

In [161]: df
Out[161]:
  customer_id  Group     Flag
0         ABC      1  Control
1         CDE      1     Test
2         BHF      2     Test
3         NID      1  Control
4         WKL      2     Test
5         SDI      2  Control

UPDATE:

In [8]: df
Out[8]:
  customer_id  Group
0         ABC      11         CDE      12         BHF      23         NID      14         WKL      25         SDI      26         XXX      37         XYZ      38         XXX      3

In [9]: d = {1:[.5,.5], 2:[.4,.6], 3:[.2,.8]}

In [10]: df['Flag'] = \
    ...: df.groupby('Group')['customer_id'] \
    ...:   .transform(lambda x: np.random.choice(['Control','Test'], len(x), p=d[x.name]))
    ...:

In [11]: df
Out[11]:
  customer_id  Group     Flag
0         ABC      1     Test
1         CDE      1     Test
2         BHF      2  Control
3         NID      1  Control
4         WKL      2  Control
5         SDI      2     Test
6         XXX      3     Test
7         XYZ      3     Test
8         XXX      3     Test

Post a Comment for "Python Pandas: Assign Control Vs. Treatment Groupings Randomly Based On %"