Skip to content Skip to sidebar Skip to footer

Updating Column In A Dataframe Based On Multiple Columns

I have a column named 'age' with a few NaN; crude logic of deriving the value of the age is finding the mean of age using 2 key categorical variables - job, gender df = pd.DataFra

Solution 1:

Use Series.fillna with GroupBy.transform, but because in sample data are not data for combination c, M there is NaN:

df['age']= df['age'].fillna(df.groupby(['job','gender'])['age'].transform('mean'))
print (df)
    col1   age job gender
0119.0   a      M
1223.0   b      F21NaNc      M
3229.0   d      F4370.0   e      M
5432.0   a      F61127.0   b      M
71248.0cF81339.0   d      M
91270.0   e      M
101129.0   a      F11151.0   b      F121048.0cF

If need also replace NaN by groiping only by id add another fillna:

avg1 = df.groupby(['job','gender'])['age'].transform('mean')
avg2 = df.groupby('job')['age'].transform('mean')

df['age'] = df['age'].fillna(avg1).fillna(avg2)
print (df)
    col1   age job gender
0      1  19.0   a      M
1      2  23.0   b      F
2      1  48.0   c      M
3      2  29.0   d      F
4      3  70.0   e      M
5      4  32.0   a      F
6     11  27.0   b      M
7     12  48.0   c      F
8     13  39.0   d      M
9     12  70.0   e      M
10    11  29.0   a      F
11     1  51.0   b      F
12    10  48.0   c      F

Post a Comment for "Updating Column In A Dataframe Based On Multiple Columns"