Skip to content Skip to sidebar Skip to footer

Variable Fillna() In Each Column

For starters, here is some artificial data fitting my problem: df = pd.DataFrame(np.random.randint(0, 100, size=(vsize, 10)), columns = ['col_{}'.format(x) for x in rang

Solution 1:

One quick solution is to modify your min_max_check to get_noise at each row:

def gen_noise(col):
    num_row = len(df)

    # generate noise of the same height as our dataset# notice the size argument in randintif ((df[col].dropna() >= 0) & (df[col].dropna() <= 1.0)).all():
        noise = 0
    elif (df[col].dropna() >= 0).all():
        noise =  np.random.randint(low = 0, 
                                   high = 3, 
                                   size=num_row)
    else:
        noise =  np.random.randint(low = -3, 
                                   high = 3,
                                   size=num_row)

    # multiplication with isna() forces those at non-null values in df[col] to be 0return noise * df[col].isna()

And then later:

df.set_index(tar, inplace=True)

for col in cols[:1]:
    noise = gen_noise(col)
    df[col] = (df[col].fillna(medians[col])
                      .add(noise.mul(stds[col]).values)
              )

df.reset_index(inplace=True)

Note: You can modify the code further in the sense that you generate the noise_df with the same size with medians and stds, something like this

for tar in tar_list:
    medians = df[cols].groupby(df[tar]).agg('median')
    stds = df[cols].groupby(df[tar]).agg('std')

    # generate noise_df here
    medians = medians + round(noise_df*std, 2)

    df.set_index(tar, inplace=True)

    for col in cols[:1]:
        df[col] = df[col].fillna(medians[col])    

    df.reset_index(inplace=True)

df.index = idx

Post a Comment for "Variable Fillna() In Each Column"