Efficient Numpy Subarrays Extraction From A Mask

February 25, 2024 Post a Comment

I am searching a pythonic way to extract multiple subarrays from a given array using a mask as shown in the example: a = np.array([10, 5, 3, 2, 1]) m = np.array([True, True, False,

Solution 1:

Here's one approach -

defseparate_regions(a, m):
    m0 = np.concatenate(( [False], m, [False] ))
    idx = np.flatnonzero(m0[1:] != m0[:-1])
    return [a[idx[i]:idx[i+1]] for i inrange(0,len(idx),2)]

Sample run -

In [41]: a = np.array([10, 5, 3, 2, 1])
    ...: m = np.array([True, True, False, True, True])
    ...: 

In [42]: separate_regions(a, m)
Out[42]: [array([10,  5]), array([2, 1])]

Runtime test

Other approach(es) -

# @kazemakase's solndefzip_split(a, m):
    d = np.diff(m)
    cuts = np.flatnonzero(d) + 1

    asplit = np.split(a, cuts)
    msplit = np.split(m, cuts)

    L = [aseg for aseg, mseg inzip(asplit, msplit) if np.all(mseg)]
    return L

Timings -

In [49]: a = np.random.randint(0,9,(100000))

In [50]: m = np.random.rand(100000)>0.2# @kazemakase's's solution
In [51]: %timeit zip_split(a,m)
10 loops, best of 3: 114 ms per loop

# @Daniel Forsman's solution
In [52]: %timeit splitByBool(a,m)
10 loops, best of 3: 25.1 ms per loop

# Proposed in this post
In [53]: %timeit separate_regions(a, m)
100 loops, best of 3: 5.01 ms per loop

Increasing the average length of islands -

Baca Juga

In [58]: a = np.random.randint(0,9,(100000))

In [59]: m = np.random.rand(100000)>0.1

In [60]: %timeit zip_split(a,m)
10 loops, best of 3: 64.3 ms per loop

In [61]: %timeit splitByBool(a,m)
100 loops, best of 3: 14 ms per loop

In [62]: %timeit separate_regions(a, m)
100 loops, best of 3: 2.85 ms per loop

Solution 2:

def splitByBool(a, m):
    if m[0]:
        return np.split(a, np.nonzero(np.diff(m))[0] + 1)[::2]
    else:
        return np.split(a, np.nonzero(np.diff(m))[0] + 1)[1::2]

This will return a list of arrays, split into chunks of True in m

Solution 3:

Sounds like a natural application for np.split.

You first have to figure out where to cut the array, which is where the mask changes between True and False. Next discard all elements where the mask is False.

a = np.array([10, 5, 3, 2, 1])
m = np.array([True, True, False, True, True])

d = np.diff(m)
cuts = np.flatnonzero(d) + 1

asplit = np.split(a, cuts)
msplit = np.split(m, cuts)

L = [aseg for aseg, mseg in zip(asplit, msplit) if np.all(mseg)]

print(L[0])  # [10  5]print(L[1])  # [2 1]

Getting Started with Python

Efficient Numpy Subarrays Extraction From A Mask

Solution 1:

Solution 2:

Solution 3:

Post a Comment for "Efficient Numpy Subarrays Extraction From A Mask"