Slice Multiple Frame Of Numpy Array With Multiple Y1:y2, X1:x2
Solution 1:
The real question here is how to convert arbitrary slices into something you can use across multiple dimensions without looping. I would posit that the trick is to use a clever combination of fancy indexing, arange
, and repeat
.
The goal is to create an array of row and column indices that corresponds to each dimension. Let's take a simple case that is easy to visualize: a 3-frame set of 3x3 matrices, where we want to assign to the upper left and lower right 2x2 sub-arrays to the first two frames, and the entire thing to the last frame:
multi_array = np.zeros((3, 3, 3))
slice_rrcc = np.array([[0, 2, 0, 2], [1, 3, 1, 3], [0, 3, 0, 3]])
Let's come up with the indices that match each one, as well as the sizes and shapes:
nframes = slice_rrcc.shape[0] # 3nrows = np.diff(slice_rrcc[:, :2], axis=1).ravel() # [2, 2, 3]ncols = np.diff(slice_rrcc[:, 2:], axis=1).ravel() # [2, 2, 3]sizes = nrows * ncols # [4, 4, 9]
We need the following fancy indices to be able to do the assignment:
frame_index = np.array([0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2])
row_index = np.array([0, 0, 1, 1, 1, 1, 2, 2, 0, 0, 0, 1, 1, 1, 2, 2, 2])
col_index = np.array([0, 1, 0, 1, 1, 2, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2])
If we can obtain the arrays frame_index
, row_index
, and col_index
, we can set the data for each segment as follows:
multi_array[frame_index, row_index, col_index] = 1
frame_index
index is easy to obtain:
frame_index = np.repeat(np.arange(nframes), sizes)
row_index
takes a bit more work. You need to generate a set of nrows
indices for each individual frame, and repeat them ncols
times. You can do this by generating a continuous range and restarting the count at each frame using subtraction:
row_range = np.arange(nrows.sum())
row_offsets = np.zeros_like(row_range)
row_offsets[np.cumsum(nrows[:-1])] = nrows[:-1]
row_index = row_range - np.cumsum(row_offsets) + np.repeat(slice_rrcc[:, 0], nrows)
segments = np.repeat(ncols, nrows)
row_index = np.repeat(row_index, segments)
col_index
will be less trivial still. You need to generate a sequence for each row with the right offset, and repeat it in chunks for each row, and then for each frame. The approach is similar to that for row_index
, with an additional fancy index to get the order right:
col_index_index = np.arange(sizes.sum())
col_index_resets = np.cumsum(segments[:-1])
col_index_offsets = np.zeros_like(col_index_index)
col_index_offsets[col_index_resets] = segments[:-1]
col_index_offsets[np.cumsum(sizes[:-1])] -= ncols[:-1]
col_index_index -= np.cumsum(col_index_offsets)
col_range = np.arange(ncols.sum())
col_offsets = np.zeros_like(col_range)
col_offsets[np.cumsum(ncols[:-1])] = ncols[:-1]
col_index = col_range - np.cumsum(col_offsets) + np.repeat(slice_rrcc[:, 2], ncols)
col_index = col_index[col_index_index]
Using this formulation, you can even step it up and specify a different value for each frame. If you wanted to assign values = [1, 2, 3]
to the frames in my example, just do
multi_array[frame_index, row_index, col_index] = np.repeat(values, sizes)
We'll see if there is a more efficient way to do this. One part I asked about is here.
Benchmark
A comparison of your loop vs my vectorized solution for nframes
in {10, 100, 1000} and width and height of multi_array
in {100, 1000, 10000}
:
defset_slices_loop(arr, slice_rrcc):
for a, s inzip(arr, slice_rrcc):
a[s[0]:s[1], s[2]:s[3]] = 1
np.random.seed(0xABCDEF)
for nframes in [10, 100, 1000]:
for dim in [10, 32, 100]:
print(f'Size = {nframes}x{dim}x{dim}')
arr = np.zeros((nframes, dim, dim), dtype=int)
slice = np.zeros((nframes, 4), dtype=int)
slice[:, ::2] = np.random.randint(0, dim - 1, size=(nframes, 2))
slice[:, 1::2] = np.random.randint(slice[:, ::2] + 1, dim, size=(nframes, 2))
%timeit set_slices_loop(arr, slice)
arr[:] = 0
%timeit set_slices(arr, slice)
The results are overwhelmingly in favor of the loop, with the only exception of very large numbers of frames and small frame sizes. Most "normal" cases are an order of magnitude faster with looping:
Looping
| Dimension |
| 100 | 1000 | 10000 |
--------+---------+---------+---------+
F 10 | 33.8 µs | 35.8 µs | 43.4 µs |
r -----+---------+---------+---------+
a 100 | 310 µs | 331 µs | 401 µs |
m -----+---------+---------+---------+
e 1000 | 3.09 ms | 3.31 ms | 4.27 ms |
--------+---------+---------+---------+
Vectorized
| Dimension |
| 100 | 1000 | 10000 |
--------+---------+---------+---------+
F 10 | 225 µs | 266 µs | 545 µs |
r -----+---------+---------+---------+
a 100 | 312 µs | 627 µs | 4.11 ms |
m -----+---------+---------+---------+
e 1000 | 1.07 ms | 4.63 ms | 48.5 ms |
--------+---------+---------+---------+
TL;DR
Can be done, but not recommended:
def set_slices(arr, slice_rrcc, value):
nframes = slice_rrcc.shape[0]
nrows = np.diff(slice_rrcc[:, :2], axis=1).ravel()
ncols = np.diff(slice_rrcc[:, 2:], axis=1).ravel()
sizes = nrows * ncols
segments = np.repeat(ncols, nrows)
frame_index = np.repeat(np.arange(nframes), sizes)
row_range = np.arange(nrows.sum())
row_offsets = np.zeros_like(row_range)
row_offsets[np.cumsum(nrows[:-1])] = nrows[:-1]
row_index = row_range - np.cumsum(row_offsets) + np.repeat(slice_rrcc[:, 0], nrows)
row_index = np.repeat(row_index, segments)
col_index_index = np.arange(sizes.sum())
col_index_resets = np.cumsum(segments[:-1])
col_index_offsets = np.zeros_like(col_index_index)
col_index_offsets[col_index_resets] = segments[:-1]
col_index_offsets[np.cumsum(sizes[:-1])] -= ncols[:-1]
col_index_index -= np.cumsum(col_index_offsets)
col_range = np.arange(ncols.sum())
col_offsets = np.zeros_like(col_range)
col_offsets[np.cumsum(ncols[:-1])] = ncols[:-1]
col_index = col_range - np.cumsum(col_offsets) + np.repeat(slice_rrcc[:, 2], ncols)
col_index = col_index[col_index_index]
if values.size == 1:
arr[frame_index, row_index, col_index] = value
else:
arr[frame_index, row_index, col_index] = np.repeat(values, sizes)
Solution 2:
This is a benchmarking post using benchit
package (few benchmarking tools packaged together; disclaimer: I am its author) to benchmark proposed solutions.
We are benchmarking set_slices
from @Mad Physicist's soln with arr[frame_index, row_index, col_index] = 1
and set_slices_loop
without any changes to get runtime (sec)
.
np.random.seed(0xABCDEF)
in_ = {}
for nframes in [10, 100, 1000]:
for dim in [10, 32, 100]:
arr = np.zeros((nframes, dim, dim), dtype=int)
slice = np.zeros((nframes, 4), dtype=int)
slice[:, ::2] = np.random.randint(0, dim - 1, size=(nframes, 2))
slice[:, 1::2] = np.random.randint(slice[:, ::2] + 1, dim, size=(nframes, 2))
in_[(nframes, dim)] = [arr, slice]
import benchit
funcs = [set_slices, set_slices_loop]
t = benchit.timings(funcs, in_, input_name=['NumFrames', 'Dim'], multivar=True)
t.plot(sp_argID=1, logx=True, save='timings.png')
Post a Comment for "Slice Multiple Frame Of Numpy Array With Multiple Y1:y2, X1:x2"