Skip to content Skip to sidebar Skip to footer

Efficient Numpy Subarrays Extraction From A Mask

I am searching a pythonic way to extract multiple subarrays from a given array using a mask as shown in the example: a = np.array([10, 5, 3, 2, 1]) m = np.array([True, True, False,

Solution 1:

Here's one approach -

defseparate_regions(a, m):
    m0 = np.concatenate(( [False], m, [False] ))
    idx = np.flatnonzero(m0[1:] != m0[:-1])
    return [a[idx[i]:idx[i+1]] for i inrange(0,len(idx),2)]

Sample run -

In [41]: a = np.array([10, 5, 3, 2, 1])
    ...: m = np.array([True, True, False, True, True])
    ...: 

In [42]: separate_regions(a, m)
Out[42]: [array([10,  5]), array([2, 1])]

Runtime test

Other approach(es) -

# @kazemakase's solndefzip_split(a, m):
    d = np.diff(m)
    cuts = np.flatnonzero(d) + 1

    asplit = np.split(a, cuts)
    msplit = np.split(m, cuts)

    L = [aseg for aseg, mseg inzip(asplit, msplit) if np.all(mseg)]
    return L

Timings -

In [49]: a = np.random.randint(0,9,(100000))

In [50]: m = np.random.rand(100000)>0.2# @kazemakase's's solution
In [51]: %timeit zip_split(a,m)
10 loops, best of 3: 114 ms per loop

# @Daniel Forsman's solution
In [52]: %timeit splitByBool(a,m)
10 loops, best of 3: 25.1 ms per loop

# Proposed in this post
In [53]: %timeit separate_regions(a, m)
100 loops, best of 3: 5.01 ms per loop

Increasing the average length of islands -

In [58]: a = np.random.randint(0,9,(100000))

In [59]: m = np.random.rand(100000)>0.1

In [60]: %timeit zip_split(a,m)
10 loops, best of 3: 64.3 ms per loop

In [61]: %timeit splitByBool(a,m)
100 loops, best of 3: 14 ms per loop

In [62]: %timeit separate_regions(a, m)
100 loops, best of 3: 2.85 ms per loop

Solution 2:

def splitByBool(a, m):
    if m[0]:
        return np.split(a, np.nonzero(np.diff(m))[0] + 1)[::2]
    else:
        return np.split(a, np.nonzero(np.diff(m))[0] + 1)[1::2] 

This will return a list of arrays, split into chunks of True in m

Solution 3:

Sounds like a natural application for np.split.

You first have to figure out where to cut the array, which is where the mask changes between True and False. Next discard all elements where the mask is False.

a = np.array([10, 5, 3, 2, 1])
m = np.array([True, True, False, True, True])

d = np.diff(m)
cuts = np.flatnonzero(d) + 1

asplit = np.split(a, cuts)
msplit = np.split(m, cuts)

L = [aseg for aseg, mseg in zip(asplit, msplit) if np.all(mseg)]

print(L[0])  # [10  5]print(L[1])  # [2 1]

Post a Comment for "Efficient Numpy Subarrays Extraction From A Mask"