Variable Fillna() In Each Column
For starters, here is some artificial data fitting my problem: df = pd.DataFrame(np.random.randint(0, 100, size=(vsize, 10)), columns = ['col_{}'.format(x) for x in rang
Solution 1:
One quick solution is to modify your min_max_check
to get_noise
at each row:
def gen_noise(col):
num_row = len(df)
# generate noise of the same height as our dataset# notice the size argument in randintif ((df[col].dropna() >= 0) & (df[col].dropna() <= 1.0)).all():
noise = 0
elif (df[col].dropna() >= 0).all():
noise = np.random.randint(low = 0,
high = 3,
size=num_row)
else:
noise = np.random.randint(low = -3,
high = 3,
size=num_row)
# multiplication with isna() forces those at non-null values in df[col] to be 0return noise * df[col].isna()
And then later:
df.set_index(tar, inplace=True)
for col in cols[:1]:
noise = gen_noise(col)
df[col] = (df[col].fillna(medians[col])
.add(noise.mul(stds[col]).values)
)
df.reset_index(inplace=True)
Note: You can modify the code further in the sense that you generate the noise_df
with the same size with medians
and stds
, something like this
for tar in tar_list:
medians = df[cols].groupby(df[tar]).agg('median')
stds = df[cols].groupby(df[tar]).agg('std')
# generate noise_df here
medians = medians + round(noise_df*std, 2)
df.set_index(tar, inplace=True)
for col in cols[:1]:
df[col] = df[col].fillna(medians[col])
df.reset_index(inplace=True)
df.index = idx
Post a Comment for "Variable Fillna() In Each Column"