Randomizing/shuffling Rows In A Dataframe In Pandas
I am currently trying to find a way to randomize items in a dataframe row-wise. I found this thread on shuffling/permutation column-wise in pandas (shuffling/permutating a DataFram
Solution 1:
Edit: I misunderstood the question, which was just to shuffle rows and not all the table (right?)
I think using dataframes does not make lots of sense, because columns names become useless. So you can just use 2D numpy arrays :
In [1]: A
Out[1]:
array([[11, 'Blue', 'Mon'],
[8, 'Red', 'Tues'],
[10, 'Green', 'Wed'],
[15, 'Yellow', 'Thurs'],
[11, 'Black', 'Fri']], dtype=object)
In [2]: _ = [np.random.shuffle(i) for i in A] # shuffle in-place, so returnNoneIn [3]: A
Out[3]:
array([['Mon', 11, 'Blue'],
[8, 'Tues', 'Red'],
['Wed', 10, 'Green'],
['Thurs', 15, 'Yellow'],
[11, 'Black', 'Fri']], dtype=object)
And if you want to keep dataframe :
In [4]: pd.DataFrame(A, columns=data.columns)
Out[4]:
Number color day
0 Mon 11 Blue
18 Tues Red
2 Wed 10 Green
3 Thurs 15 Yellow
411 Black Fri
Here a function to shuffle rows and columns:
import numpy as np
import pandas as pd
def shuffle(df):
col = df.columns
val = df.values
shape = val.shape
val_flat = val.flatten()
np.random.shuffle(val_flat)
return pd.DataFrame(val_flat.reshape(shape),columns=col)
In [2]: data
Out[2]:
Number color day
011 Blue Mon
18 Red Tues
210 Green Wed
315 Yellow Thurs
411 Black Fri
In [3]: shuffle(data)
Out[3]:
Number color day
0 Fri Wed Yellow
1 Thurs Black Red
2 Green Blue 113118104 Mon Tues 15
Hope this helps
Solution 2:
Maybe flatten the 2d array and then shuffle?
In [21]: data2=dataframe.values.flatten()
In [22]: np.random.shuffle(data2)
In [23]: dataframe2=pd.DataFrame (data2.reshape(dataframe.shape), columns=dataframe.columns )
In [24]: dataframe2
Out[24]:
Number color day
0 Tues Yellow 111 Red Green Wed
2 Thurs Mon Blue
3158 Black
4 Fri 1110
Solution 3:
Building on @jrjc 's answer, I have posted https://stackoverflow.com/a/44686455/5009287 which uses np.apply_along_axis()
a = np.array([[10, 11, 12], [20, 21, 22], [30, 31, 32],[40, 41, 42]])
print(a)
[[10 11 12]
[20 21 22]
[30 31 32]
[40 41 42]]print(np.apply_along_axis(np.random.permutation, 1, a))
[[11 12 10]
[22 21 20]
[31 30 32]
[40 41 42]]
See the full answer to see how that could be integrated with a Pandas df.
Post a Comment for "Randomizing/shuffling Rows In A Dataframe In Pandas"