How To Cut Unsorted Time-series Data Into Bins With A Minimum Interval?
I have a dataframe like this x = pd.DataFrame({'a':[1.1341, 1.13421, 1.13433, 1.13412, 1.13435, 1.13447, 1.13459, 1.13452, 1.13471, 1.1348, 1.13496,1.13474,1.13483,1.1349,1.13502,
Solution 1:
I don't believe there is a vectorized way to do this, so you probably need to loop through the values.
x = x.assign(output=0) # Initialize all the output values to zero.
x['output'].iat[0] = 1
threshold = 0.0005
prior_val = x['a'].iat[0]
for n, val inenumerate(x['a']):
ifabs(val - prior_val) >= threshold:
x['output'].iat[n] = 1
prior_val = val # Reset to new value found that exceeds threshold.
Solution 2:
Here is my try with the most of vectorization and a recursive function.
The recursive function build a one line dataframe send to the caller and concatenate at the end of the main function.
It uses the nullable integer type added to pandas in version 0.24.
Edit: This solution is tenth times slower than the one with loops. You should not use it.
import pandas as pd
deffind_next_step(df, initial_value, threshold):
try:
following_index = (
df.loc[lambda x: (x['a'] - initial_value).abs() >= threshold]
.loc[:, 'a']
.index[0]
)
except IndexError:
return []
to_append = find_next_step(
df.loc[following_index + 1:, :], x.loc[following_index, 'a'], threshold
)
to_append.append(
pd.DataFrame({'output': [1]}, index=[following_index], dtype=pd.Int64Dtype())
)
return to_append
if __name__ == '__main__':
x = pd.DataFrame({'a':[1.1341, 1.13421, 1.13433, 1.13412, 1.13435, 1.13447, 1.13459, 1.13452, 1.13471, 1.1348, 1.13496,1.13474,1.13483,1.1349,1.13502,1.13515,1.13526,1.13512]})
output_list = find_next_step(x.iloc[1:, :], x.loc[:, 'a'].iloc[0], 0.0005)
output_list.append(pd.DataFrame({'output': [1]}, index=[0], dtype=pd.Int64Dtype()))
output_series = pd.concat(
[x, pd.concat(output_list).sort_index()], axis='columns'
).assign(output=lambda x: x['output'].fillna(0))
It works on your example, this prints:
a output
01.13410111.13421021.13433031.13412041.13435051.13447061.13459071.13452081.13471191.134800101.134960111.134740121.134830131.134900141.135020151.135150161.135261171.135120
Post a Comment for "How To Cut Unsorted Time-series Data Into Bins With A Minimum Interval?"