Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas shuffle rows at a certain level

Tags:

python

pandas

I have a Pandas DataFrame using a MultiIndex on the rows:

index = pandas.MultiIndex.from_tuples(list(itertools.product(range(3), range(3))))
df = pandas.DataFrame(numpy.random.randn(9,3), index=index, columns=['A', 'B', 'C'])

            A         B         C
0 0  2.400417  0.698638  1.231540
  1 -0.023154 -2.110450  0.774964
  2 -1.282392 -0.062794  1.471655
1 0 -1.081853  0.261876 -1.771075
  1 -2.013747 -0.377957 -0.393802
  2  1.711172 -0.552468  1.018727
2 0  0.155821 -0.222691  0.496586
  1  0.563638 -0.756709  1.050212
  2 -1.446159 -0.891549  0.256695

I would like to shuffle this DataFrame on the first level of the index, so a possible result would be:

            A         B         C
1 0 -1.081853  0.261876 -1.771075
  1 -2.013747 -0.377957 -0.393802
  2  1.711172 -0.552468  1.018727
0 0  2.400417  0.698638  1.231540
  1 -0.023154 -2.110450  0.774964
  2 -1.282392 -0.062794  1.471655
2 0  0.155821 -0.222691  0.496586
  1  0.563638 -0.756709  1.050212
  2 -1.446159 -0.891549  0.256695
like image 754
Marijn van Vliet Avatar asked May 04 '12 08:05

Marijn van Vliet


2 Answers

The reindex method can accomplish this when passed a reordered array of tuples matching the desired order. At which point, reordering can be done as best fits your problem. For example:

In [38]: df
Out[38]: 
            A         B         C
0 0 -1.725337  0.111493  0.178294
  1 -1.809003 -0.614219 -0.931909
  2  0.621427 -0.186233  0.254727
1 0 -1.322863  1.242415  1.375579
  1  0.249738 -1.280204  0.356491
  2 -0.743671  0.325841 -0.167772
2 0 -0.070937  0.401172 -1.790801
  1  1.433794  2.257198  1.848435
  2 -1.021557 -1.054363 -1.485536

In [39]: neworder = [1, 0, 2]

In [41]: newindex = sorted(df.index, key=lambda x: neworder.index(x[0]))

In [42]: newindex
Out[42]: 
[(1L, 0L),
 (1L, 1L),
 (1L, 2L),
 (0L, 0L),
 (0L, 1L),
 (0L, 2L),
 (2L, 0L),
 (2L, 1L),
 (2L, 2L)]

In [43]: df.reindex(newindex)
Out[43]: 
            A         B         C
1 0 -1.322863  1.242415  1.375579
  1  0.249738 -1.280204  0.356491
  2 -0.743671  0.325841 -0.167772
0 0 -1.725337  0.111493  0.178294
  1 -1.809003 -0.614219 -0.931909
  2  0.621427 -0.186233  0.254727
2 0 -0.070937  0.401172 -1.790801
  1  1.433794  2.257198  1.848435
  2 -1.021557 -1.054363 -1.485536
like image 124
Garrett Avatar answered Oct 23 '22 08:10

Garrett


A simple, concise solution is

icol = df.index.levels[k]
df.loc[icol[permutation]]

which works for the outermost level. To get a general level, either use swaplevel or IndexSlice

Full MWE:

import pandas
from io import StringIO

table = """   
i j  A         B         C
0 0  2.400417  0.698638  1.231540
  1 -0.023154 -2.110450  0.774964
  2 -1.282392 -0.062794  1.471655
1 0 -1.081853  0.261876 -1.771075
  1 -2.013747 -0.377957 -0.393802
  2  1.711172 -0.552468  1.018727
2 0  0.155821 -0.222691  0.496586
  1  0.563638 -0.756709  1.050212
  2 -1.446159 -0.891549  0.256695
"""

df = pandas.read_fwf(StringIO(table), dtype={"i":"UInt8", "j":"UInt8"})
df = df.ffill().set_index(["i", "j"])

i, j = df.index.levels
idx = pandas.IndexSlice
perm = [2,0,1]
df.loc[i[perm]]

Returns

            A         B         C
i j                              
2 0  0.155821 -0.222691  0.496586
  1  0.563638 -0.756709  1.050212
  2 -1.446159 -0.891549  0.256695
0 0  2.400417  0.698638  1.231540
  1 -0.023154 -2.110450  0.774964
  2 -1.282392 -0.062794  1.471655
1 0 -1.081853  0.261876 -1.771075
  1 -2.013747 -0.377957 -0.393802
  2  1.711172 -0.552468  1.018727

And df.loc[idx[i, j[perm]], :] gives

            A         B         C
i j                              
0 2 -1.282392 -0.062794  1.471655
  0  2.400417  0.698638  1.231540
  1 -0.023154 -2.110450  0.774964
1 2  1.711172 -0.552468  1.018727
  0 -1.081853  0.261876 -1.771075
  1 -2.013747 -0.377957 -0.393802
2 2 -1.446159 -0.891549  0.256695
  0  0.155821 -0.222691  0.496586
  1  0.563638 -0.756709  1.050212

But be careful! Although df.loc[idx[i[perm], :], :] and df.loc[idx[i[[perm]], j], :] give the same result, the above df.loc[idx[i, j[perm]], :] returns something different than the (maybe) unexpected result from df.loc[idx[:, j[perm]], :]:

            A         B         C
i j                              
0 2 -1.282392 -0.062794  1.471655
1 2  1.711172 -0.552468  1.018727
2 2 -1.446159 -0.891549  0.256695
0 0  2.400417  0.698638  1.231540
1 0 -1.081853  0.261876 -1.771075
2 0  0.155821 -0.222691  0.496586
0 1 -0.023154 -2.110450  0.774964
1 1 -2.013747 -0.377957 -0.393802
2 1  0.563638 -0.756709  1.050212
like image 30
Hyperplane Avatar answered Oct 23 '22 08:10

Hyperplane