Fill Matrix Of Occurences From Column/row Arrays Of Indexes
Solution 1:
Use np.add.at
, specifying a tuple of indices:
>>> np.add.at(new_matrix, (rows, columns), 1)
>>> new_matrix
array([[ 1., 0., 0.],
[ 0., 2., 0.],
[ 0., 1., 2.],
[ 2., 1., 5.]])
np.add.at
operates on the array in-place, adding 1
as many times to the indices as specified by the (row, columns)
tuple.
Solution 2:
Approach #1
We can convert those pairs to linear indices and then use np.bincount
-
defbincount_app(rows, columns, n_rows, n_columns):
# Get linear index equivalent
lidx = (columns.max()+1)*rows + columns
# Use binned count on the linear indicesreturn np.bincount(lidx, minlength=n_rows*n_columns).reshape(n_rows,n_columns)
Sample run -
In [242]: n_rows = 4
...: n_columns = 3
...:
...: rows = np.array([0, 1, 1, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3])
...: columns = np.array([0, 1, 1, 1, 2, 2, 0, 1, 2, 0, 2, 2, 2, 2])
In [243]: bincount_app(rows, columns, n_rows, n_columns)
Out[243]:
array([[1, 0, 0],
[0, 2, 0],
[0, 1, 2],
[2, 1, 5]])
Approach #2
Alternatively, we can sort the linear indices and get the counts using slicing
to have our second approach, like so -
def mask_diff_app(rows, columns, n_rows, n_columns):
lidx = (columns.max()+1)*rows + columns
lidx.sort()
mask = np.concatenate(([True],lidx[1:] != lidx[:-1],[True]))
count = np.diff(np.flatnonzero(mask))
new_matrix = np.zeros([n_rows, n_columns],dtype=int)
new_matrix.flat[lidx[mask[:-1]]] = count
return new_matrix
Approach #3
This seems like a straight-forward one with sparse matrix csr_matrix
as well, as it does accumulation on its own for repeated indices. The benefit is the memory efficiency, given that it's a sparse matrix, which would be noticeable if you are filling a small number of places in the output and a sparse matrix output is okay.
The implementation would look something like this -
from scipy.sparse import csr_matrix
defsparse_matrix_app(rows, columns, n_rows, n_columns):
out_shp = (n_rows, n_columns)
data = np.ones(len(rows),dtype=int)
return csr_matrix((data, (rows, columns)), shape=out_shp)
If you need a regular/dense array, simply do -
sparse_matrix_app(rows, columns, n_rows, n_columns).toarray()
Sample output -
In [319]: sparse_matrix_app(rows, columns, n_rows, n_columns).toarray()
Out[319]:
array([[1, 0, 0],
[0, 2, 0],
[0, 1, 2],
[2, 1, 5]])
Benchmarking
Other approach(es) -
# @cᴏʟᴅsᴘᴇᴇᴅ's solndef add_at_app(rows, columns, n_rows, n_columns):
new_matrix = np.zeros([n_rows, n_columns],dtype=int)
np.add.at(new_matrix, (rows, columns), 1)
Timings
Case #1 : Output array of shape (1000, 1000)
and no. of indices = 10k
In [307]:# Setup...:n_rows=1000...:n_columns=1000...:rows=np.random.randint(0,1000,(10000))...:columns=np.random.randint(0,1000,(10000))In [308]:%timeitadd_at_app(rows,columns,n_rows,n_columns)...:%timeitbincount_app(rows,columns,n_rows,n_columns)...:%timeitmask_diff_app(rows,columns,n_rows,n_columns)...:%timeitsparse_matrix_app(rows,columns,n_rows,n_columns)1000 loops,best of 3:1.05msperloop1000 loops,best of 3:424µsperloop1000 loops,best of 3:1.05msperloop1000 loops,best of 3:1.41msperloop
Case #2 : Output array of shape (1000, 1000)
and no. of indices = 100k
In [309]: # Setup
...: n_rows =1000
...: n_columns =1000
...: rows= np.random.randint(0,1000,(100000))
...: columns = np.random.randint(0,1000,(100000))
In [310]: %timeit add_at_app(rows, columns, n_rows, n_columns)
...: %timeit bincount_app(rows, columns, n_rows, n_columns)
...: %timeit mask_diff_app(rows, columns, n_rows, n_columns)
...: %timeit sparse_matrix_app(rows, columns, n_rows, n_columns)
100 loops, best of3: 11.4 ms per loop
1000 loops, best of3: 1.27 ms per loop
100 loops, best of3: 7.44 ms per loop
10 loops, best of3: 20.4 ms per loop
Case #3 : Sparse-ness in output
As stated earlier, for the sparse method to work better, we would need sparse-ness. Such a case would be like this -
In [314]: # Setup
...: n_rows =5000
...: n_columns =5000
...: rows= np.random.randint(0,5000,(1000))
...: columns = np.random.randint(0,5000,(1000))
In [315]: %timeit add_at_app(rows, columns, n_rows, n_columns)
...: %timeit bincount_app(rows, columns, n_rows, n_columns)
...: %timeit mask_diff_app(rows, columns, n_rows, n_columns)
...: %timeit sparse_matrix_app(rows, columns, n_rows, n_columns)
100 loops, best of3: 11.7 ms per loop
100 loops, best of3: 11.1 ms per loop
100 loops, best of3: 11.1 ms per loop
1000 loops, best of3: 269 µs per loop
If you need a dense array, we lose the memory efficiency and hence performance one as well -
In [317]: %timeit sparse_matrix_app(rows, columns, n_rows, n_columns).toarray()
100 loops, best of 3: 11.7 ms per loop
Post a Comment for "Fill Matrix Of Occurences From Column/row Arrays Of Indexes"