Randomly Selecting Rows From Numpy Array
I want to randomly select rows from a numpy array. Say I have this array- A = [[1, 3, 0], [3, 2, 0], [0, 2, 1], [1, 1, 4], [3, 2, 2], [0, 1, 0], [1, 3
Solution 1:
You can make any number of row-wise random partitions of A
by slicing a shuffled sequence of row indices:
ind = numpy.arange( A.shape[ 0 ] )
numpy.random.shuffle( ind )
B = A[ ind[ :6 ], : ]
C = A[ ind[ 6: ], : ]
If you don't want to change the order of the rows in each subset, you can sort each slice of the indices:
B = A[ sorted( ind[ :6 ] ), : ]
C = A[ sorted( ind[ 6: ] ), : ]
(Note that the solution provided by @MaxNoe also preserves row order.)
Solution 2:
Solution
This gives you the indices for the selection:
sel = np.random.choice(A.shape[0], size=6, replace=False)
and this B
:
B = A[sel]
Get all not selected indices:
unsel = list(set(range(A.shape[0])) - set(sel))
and use them for C
:
C = A[unsel]
Variation with NumPy functions
Instead of using set
and list
, you can use this:
unsel2 = np.setdiff1d(np.arange(A.shape[0]), sel)
For the example array the pure Python version:
%%timeit
unsel1 = list(set(range(A.shape[0])) -set(sel))
100000 loops, best of3: 8.42 µs per loop
is faster than the NumPy version:
%%timeitunsel2= np.setdiff1d(np.arange(A.shape[0]), sel)
10000 loops, best of 3: 77.5 µs per loop
For larger A
the NumPy version is faster:
A = np.random.random((int(1e4), 3))
sel = np.random.choice(A.shape[0], size=6, replace=False)
%%timeit
unsel1 = list(set(range(A.shape[0])) -set(sel))
1000 loops, best of3: 1.4 ms per loop
%%timeit
unsel2 = np.setdiff1d(np.arange(A.shape[0]), sel)
1000 loops, best of3: 315 µs per loop
Solution 3:
You can use boolean masks and draw random indices from an integer array which is as long as yours. The ~
is an elementwise not:
idx = np.arange(A.shape[0])
mask = np.zeros_like(idx, dtype=bool)
selected = np.random.choice(idx, 6, replace=False)
mask[selected] = True
B = A[mask]
C = A[~mask]
Post a Comment for "Randomly Selecting Rows From Numpy Array"