Is it possible to shuffle several DataFrames together?
For example I have a DataFrame df1
and a DataFrame df2
. I want to shuffle the rows randomly, but for both DataFrames in the same way.
Example
df1
:
|___|_______|
| 1 | ... |
| 2 | ... |
| 3 | ... |
| 4 | ... |
df2
:
|___|_______|
| 1 | ... |
| 2 | ... |
| 3 | ... |
| 4 | ... |
After shuffling a possible order for both DataFrames could be:
|___|_______|
| 2 | ... |
| 3 | ... |
| 4 | ... |
| 1 | ... |
I think you can double reindex
with applying numpy.random.permutation
to index
, but is necessary both DataFrame
s have same length and same unique index values:
df1 = pd.DataFrame({'a':range(5)})
print (df1)
a
0 0
1 1
2 2
3 3
4 4
df2 = pd.DataFrame({'a':range(5)})
print (df2)
a
0 0
1 1
2 2
3 3
4 4
idx = np.random.permutation(df1.index)
print (df1.reindex(idx))
a
2 2
4 4
1 1
3 3
0 0
print (df2.reindex(idx))
a
2 2
4 4
1 1
3 3
0 0
Alternative with reindex_axis
:
print (df1.reindex_axis(idx, axis=0))
print (df2.reindex_axis(idx, axis=0))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With