Skip to content Skip to sidebar Skip to footer

Randomly Selecting Rows From Numpy Array

I want to randomly select rows from a numpy array. Say I have this array- A = [[1, 3, 0], [3, 2, 0], [0, 2, 1], [1, 1, 4], [3, 2, 2], [0, 1, 0], [1, 3

Solution 1:

You can make any number of row-wise random partitions of A by slicing a shuffled sequence of row indices:

ind = numpy.arange( A.shape[ 0 ] )
numpy.random.shuffle( ind )
B = A[ ind[ :6 ], : ]
C = A[ ind[ 6: ], : ]

If you don't want to change the order of the rows in each subset, you can sort each slice of the indices:

B = A[ sorted( ind[ :6 ] ), : ]
C = A[ sorted( ind[ 6: ] ), : ]

(Note that the solution provided by @MaxNoe also preserves row order.)

Solution 2:

Solution

This gives you the indices for the selection:

sel = np.random.choice(A.shape[0], size=6, replace=False)

and this B:

B = A[sel]

Get all not selected indices:

unsel = list(set(range(A.shape[0])) - set(sel))

and use them for C:

C = A[unsel]

Variation with NumPy functions

Instead of using set and list, you can use this:

unsel2 = np.setdiff1d(np.arange(A.shape[0]), sel)

For the example array the pure Python version:

%%timeit
unsel1 = list(set(range(A.shape[0])) -set(sel)) 

100000 loops, best of3: 8.42 µs per loop

is faster than the NumPy version:

%%timeitunsel2= np.setdiff1d(np.arange(A.shape[0]), sel)

10000 loops, best of 3: 77.5 µs per loop

For larger A the NumPy version is faster:

A = np.random.random((int(1e4), 3))
sel = np.random.choice(A.shape[0], size=6, replace=False)


%%timeit
unsel1 = list(set(range(A.shape[0])) -set(sel))

1000 loops, best of3: 1.4 ms per loop


%%timeit
unsel2 = np.setdiff1d(np.arange(A.shape[0]), sel)

1000 loops, best of3: 315 µs per loop

Solution 3:

You can use boolean masks and draw random indices from an integer array which is as long as yours. The ~ is an elementwise not:

idx = np.arange(A.shape[0])
mask = np.zeros_like(idx, dtype=bool)

selected = np.random.choice(idx, 6, replace=False)
mask[selected] = True

B = A[mask]
C = A[~mask]

Post a Comment for "Randomly Selecting Rows From Numpy Array"