Python Pandas: Assign Control Vs. Treatment Groupings Randomly Based On %
I am working on an experiment design, where I need to split a dataframe df into a control and treatment group by % by pre-existing groupings. This is the dataframe df: df.head()
Solution 1:
we can use numpy.random.choice() method:
In [160]: df['Flag'] = \
...: df.groupby('Group')['customer_id']\
...: .transform(lambda x: np.random.choice(['Control','Test'], len(x),
p=[.5,.5] if x.name==1 else [.4,.6]))
...:
In [161]: df
Out[161]:
customer_id Group Flag
0 ABC 1 Control
1 CDE 1 Test
2 BHF 2 Test
3 NID 1 Control
4 WKL 2 Test
5 SDI 2 Control
UPDATE:
In [8]: df
Out[8]:
customer_id Group
0 ABC 11 CDE 12 BHF 23 NID 14 WKL 25 SDI 26 XXX 37 XYZ 38 XXX 3
In [9]: d = {1:[.5,.5], 2:[.4,.6], 3:[.2,.8]}
In [10]: df['Flag'] = \
...: df.groupby('Group')['customer_id'] \
...: .transform(lambda x: np.random.choice(['Control','Test'], len(x), p=d[x.name]))
...:
In [11]: df
Out[11]:
customer_id Group Flag
0 ABC 1 Test
1 CDE 1 Test
2 BHF 2 Control
3 NID 1 Control
4 WKL 2 Control
5 SDI 2 Test
6 XXX 3 Test
7 XYZ 3 Test
8 XXX 3 Test
Post a Comment for "Python Pandas: Assign Control Vs. Treatment Groupings Randomly Based On %"