Is it possible to shuffle several DataFrames together?
For example I have a DataFrame df1 and a DataFrame df2. I want to shuffle the rows randomly, but for both DataFrames in the same way.
Example
df1:
|___|_______|
| 1 |  ...  |
| 2 |  ...  |
| 3 |  ...  |
| 4 |  ...  |
df2:
|___|_______|
| 1 |  ...  |
| 2 |  ...  |
| 3 |  ...  |
| 4 |  ...  |
After shuffling a possible order for both DataFrames could be:
|___|_______|
| 2 |  ...  |
| 3 |  ...  |
| 4 |  ...  |
| 1 |  ...  |
                I think you can double reindex with applying numpy.random.permutation to index, but is necessary both DataFrames have same length and same unique index values:
df1 = pd.DataFrame({'a':range(5)})
print (df1)
   a
0  0
1  1
2  2
3  3
4  4
df2 = pd.DataFrame({'a':range(5)})
print (df2)
   a
0  0
1  1
2  2
3  3
4  4
idx = np.random.permutation(df1.index)
print (df1.reindex(idx))
   a
2  2
4  4
1  1
3  3
0  0
print (df2.reindex(idx))
   a
2  2
4  4
1  1
3  3
0  0
Alternative with reindex_axis:
print (df1.reindex_axis(idx, axis=0))
print (df2.reindex_axis(idx, axis=0))
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With