Skip to content Skip to sidebar Skip to footer

How To Cut Unsorted Time-series Data Into Bins With A Minimum Interval?

I have a dataframe like this x = pd.DataFrame({'a':[1.1341, 1.13421, 1.13433, 1.13412, 1.13435, 1.13447, 1.13459, 1.13452, 1.13471, 1.1348, 1.13496,1.13474,1.13483,1.1349,1.13502,

Solution 1:

I don't believe there is a vectorized way to do this, so you probably need to loop through the values.

x = x.assign(output=0)  # Initialize all the output values to zero.
x['output'].iat[0] = 1
threshold = 0.0005
prior_val = x['a'].iat[0]
for n, val inenumerate(x['a']):
    ifabs(val - prior_val) >= threshold:
        x['output'].iat[n] = 1
        prior_val = val  # Reset to new value found that exceeds threshold.

Solution 2:

Here is my try with the most of vectorization and a recursive function.

The recursive function build a one line dataframe send to the caller and concatenate at the end of the main function.

It uses the nullable integer type added to pandas in version 0.24.

Edit: This solution is tenth times slower than the one with loops. You should not use it.

import pandas as pd


deffind_next_step(df, initial_value, threshold):
    try:
        following_index = (
            df.loc[lambda x: (x['a'] - initial_value).abs() >= threshold]
            .loc[:, 'a']
            .index[0]
        )
    except IndexError:
        return []
    to_append = find_next_step(
        df.loc[following_index + 1:, :], x.loc[following_index, 'a'], threshold
    )
    to_append.append(
        pd.DataFrame({'output': [1]}, index=[following_index], dtype=pd.Int64Dtype())
    )
    return to_append


if __name__ == '__main__':
    x = pd.DataFrame({'a':[1.1341, 1.13421, 1.13433, 1.13412, 1.13435, 1.13447, 1.13459, 1.13452, 1.13471, 1.1348, 1.13496,1.13474,1.13483,1.1349,1.13502,1.13515,1.13526,1.13512]})
    output_list = find_next_step(x.iloc[1:, :], x.loc[:, 'a'].iloc[0], 0.0005)
    output_list.append(pd.DataFrame({'output': [1]}, index=[0], dtype=pd.Int64Dtype()))
    output_series = pd.concat(
        [x, pd.concat(output_list).sort_index()], axis='columns'
    ).assign(output=lambda x: x['output'].fillna(0))

It works on your example, this prints:

a  output
01.13410111.13421021.13433031.13412041.13435051.13447061.13459071.13452081.13471191.134800101.134960111.134740121.134830131.134900141.135020151.135150161.135261171.135120

Post a Comment for "How To Cut Unsorted Time-series Data Into Bins With A Minimum Interval?"