Efficient Numpy Subarrays Extraction From A Mask
I am searching a pythonic way to extract multiple subarrays from a given array using a mask as shown in the example: a = np.array([10, 5, 3, 2, 1]) m = np.array([True, True, False,
Solution 1:
Here's one approach -
defseparate_regions(a, m):
m0 = np.concatenate(( [False], m, [False] ))
idx = np.flatnonzero(m0[1:] != m0[:-1])
return [a[idx[i]:idx[i+1]] for i inrange(0,len(idx),2)]
Sample run -
In [41]: a = np.array([10, 5, 3, 2, 1])
...: m = np.array([True, True, False, True, True])
...:
In [42]: separate_regions(a, m)
Out[42]: [array([10, 5]), array([2, 1])]
Runtime test
Other approach(es) -
# @kazemakase's solndefzip_split(a, m):
d = np.diff(m)
cuts = np.flatnonzero(d) + 1
asplit = np.split(a, cuts)
msplit = np.split(m, cuts)
L = [aseg for aseg, mseg inzip(asplit, msplit) if np.all(mseg)]
return L
Timings -
In [49]: a = np.random.randint(0,9,(100000))
In [50]: m = np.random.rand(100000)>0.2# @kazemakase's's solution
In [51]: %timeit zip_split(a,m)
10 loops, best of 3: 114 ms per loop
# @Daniel Forsman's solution
In [52]: %timeit splitByBool(a,m)
10 loops, best of 3: 25.1 ms per loop
# Proposed in this post
In [53]: %timeit separate_regions(a, m)
100 loops, best of 3: 5.01 ms per loop
Increasing the average length of islands -
In [58]: a = np.random.randint(0,9,(100000))
In [59]: m = np.random.rand(100000)>0.1
In [60]: %timeit zip_split(a,m)
10 loops, best of 3: 64.3 ms per loop
In [61]: %timeit splitByBool(a,m)
100 loops, best of 3: 14 ms per loop
In [62]: %timeit separate_regions(a, m)
100 loops, best of 3: 2.85 ms per loop
Solution 2:
def splitByBool(a, m):
if m[0]:
return np.split(a, np.nonzero(np.diff(m))[0] + 1)[::2]
else:
return np.split(a, np.nonzero(np.diff(m))[0] + 1)[1::2]
This will return a list of arrays, split into chunks of True
in m
Solution 3:
Sounds like a natural application for np.split
.
You first have to figure out where to cut the array, which is where the mask changes between True
and False
. Next discard all elements where the mask is False
.
a = np.array([10, 5, 3, 2, 1])
m = np.array([True, True, False, True, True])
d = np.diff(m)
cuts = np.flatnonzero(d) + 1
asplit = np.split(a, cuts)
msplit = np.split(m, cuts)
L = [aseg for aseg, mseg in zip(asplit, msplit) if np.all(mseg)]
print(L[0]) # [10 5]print(L[1]) # [2 1]
Post a Comment for "Efficient Numpy Subarrays Extraction From A Mask"