Skip to content Skip to sidebar Skip to footer

Counting Data Points Within Limits, And Applying Buffer To Isolated Points [data Analysis]

I am stuck trying to solve this problem: I have a set of data points, that correspond to a set of time values. i.e. values =[1,2,3,4,5,6,7,8,4] times = [0.1,0.2,0.3,0.4]... and so

Solution 1:

Here's a function that does what you want. Runs of multiple data points that are within the specified limits are given a time value equal to the number of data points times the sampling period (i.e. the reciprocal of the sampling frequency), isolated single points are given a value of half the sampling period.

#!/usr/bin/env python''' Estimate time of data points falling within specified limits 
    From http://stackoverflow.com/q/29430625/4014959
    Written 2015.04.03 by PM 2Ring,
    with help from Antti Haapala and Martijn Pieters 
'''from itertools import groupby

defestimate_time(values, lo_lim, hi_lim, sample_rate):
    #Find values that are in range
    in_range = [lo_lim <= v <= hi_lim for v in values]

    #Find runs of in-range values
    runs = [sum(1for _ in group) for v, group in groupby(in_range) if v]

    #Estimate total time spent in-range
    total_time = sum(v if v > 1else0.5for v in runs)
    return total_time / sample_rate


values = [1, 2, 3, 4, 5, 6, 7, 8, 4]
sample_rate = 10.0# in Hz

lo_lim = 3
hi_lim = 5print estimate_time(values, lo_lim, hi_lim, sample_rate)

output

0.35

To check that this code really does what you want you can put some print statements into estimate_time() to show the contents of in_range and runs.


One thing you can do to reduce memory requirements is to convert the list comprehensions into generator expressions. List comprehensions have to create a whole new list in memory (which is deleted once it goes out of scope); a generator expression is a little slower, but it doesn't need to build a list - results are generated as they're needed. The syntax is very similar - just replace the square brackets of the list comp with round brackets to turn it into a gen exp.

So change

in_range = [lo_lim <= v <= hi_lim for v in values] to in_range = (lo_lim <= v <= hi_lim for v in values)

and

runs = [sum(1 for _ in group) for v, group in groupby(in_range) if v] to runs = (sum(1 for _ in group) for v, group in groupby(in_range) if v)

Post a Comment for "Counting Data Points Within Limits, And Applying Buffer To Isolated Points [data Analysis]"