Skip to content Skip to sidebar Skip to footer

Change With Nan If Values Stuck At A Single Value Over Time Using Python

As you can see below, my contains some identical consecutive values, i.e. 1, 2, and 3. Date Value 0 2017-07-18 07:40:00 1 1 2017-07-18 07:45:00 1 2 2017-07-18 07:50:0

Solution 1:

You could GroupBy consecutive values using a custom grouping scheme, check which groups have a size greater or equal to 3 and use the result to index the dataframe and set the rows of interest to NaN:

g = df.Value.diff().fillna(0).ne(0).cumsum()
m = df.groupby(g).Value.transform('size').ge(3)
df.loc[m,'Value'] = np.nan

    Date   Value
0   2017-07-18-07:40:00     NaN
1   2017-07-18-07:45:00     NaN
2   2017-07-18-07:50:00     NaN
3   2017-07-18-07:55:00  2414.0
4   2017-07-18-08:00:00     2.0
5   2017-07-18-08:05:00     2.0
6   2017-07-18-08:10:00  4416.0
7   2017-07-18-08:15:00  4416.0
8   2017-07-18-08:20:00     NaN
9   2017-07-18-08:25:00     NaN
10  2017-07-18-08:30:00     NaN
11  2017-07-18-08:35:00  6998.0

Where:

df.assign(grouper=g, mask=m, result=df_.Value)

           Date           Value   grouper mask  result
0   2017-07-18-07:40:00      1        0   True     NaN
1   2017-07-18-07:45:00      1        0   True     NaN
2   2017-07-18-07:50:00      1        0   True     NaN
3   2017-07-18-07:55:00   2414        1  False  2414.0
4   2017-07-18-08:00:00      2        2  False     2.0
5   2017-07-18-08:05:00      2        2  False     2.0
6   2017-07-18-08:10:00   4416        3  False  4416.0
7   2017-07-18-08:15:00   4416        3  False  4416.0
8   2017-07-18-08:20:00      3        4   True     NaN
9   2017-07-18-08:25:00      3        4   True     NaN
10  2017-07-18-08:30:00      3        4   True     NaN
11  2017-07-18-08:35:00   6998        5  False  6998.0

Solution 2:

Count the values. The result is a series, it needs a name for further references:

counts = df['Value'].value_counts()
counts.name = '_'

Merge the select values from the series with the original dataframe:

keep = counts[counts < 3]
df.merge(keep, left_on='Value', right_index=True)[df.columns]
#                   Date  Value
#3  2017-07-18  07:55:00   2414
#4  2017-07-18  08:00:00      2
#5  2017-07-18  08:05:00      2
#6  2017-07-18  08:10:00   4416
#7  2017-07-18  08:15:00   4416
#11 2017-07-18  08:35:00   6998

The result is a filtered dataframe.

If you use pandas version <0.24, you should upgrade, but here is a workaround:

df.merge(pd.DataFrame(keep), left_on='Value', right_index=True)[df.columns]

Post a Comment for "Change With Nan If Values Stuck At A Single Value Over Time Using Python"