I have some code that summarizes a DataFrame containing the famous Titanic dataset as follows:
titanic['agecat'] = pd.cut(titanic.age, [0, 13, 20, 64, 100],
labels=['child', 'adolescent', 'adult', 'senior'])
titanic.groupby(['agecat', 'pclass','sex']
)['survived'].mean()
This produces the following DataFrame with a MultiIndex based on the groupby
call:
agecat pclass sex
adolescent 1 female 1.000000
male 0.200000
2 female 0.923077
male 0.117647
3 female 0.542857
male 0.125000
adult 1 female 0.965517
male 0.343284
2 female 0.868421
male 0.078125
3 female 0.441860
male 0.159184
child 1 female 0.000000
male 1.000000
2 female 1.000000
male 1.000000
3 female 0.483871
male 0.324324
senior 1 female 1.000000
male 0.142857
2 male 0.000000
3 male 0.000000
Name: survived, dtype: float64
However, I want the agecat
level of the MultiIndex to be naturally ordered, rather than alphabetical ordered, that is: ['child', 'adolescent', 'adult', 'senior']
. However, if I try using reindex
to do this:
titanic.groupby(['agecat', 'pclass','sex'])['survived'].mean().reindex(
['child', 'adolescent', 'adult', 'senior'], level='agecat')
it does not have any effect on the resulting DataFrame's MultiIndex. Should this be working, or am I using the wrong approach?
You need to provide an MultiIndex that reorders
In [36]: index = MultiIndex(levels=[['foo', 'bar', 'baz', 'qux'],
['one', 'two', 'three']],
labels=[[0, 0, 0, 1, 1, 2, 2, 3, 3, 3],
[0, 1, 2, 0, 1, 1, 2, 0, 1, 2]],
names=['first', 'second'])
In [37]: df = DataFrame(np.random.randn(10, 3), index=index,
columns=Index(['A', 'B', 'C'], name='exp'))
In [38]: df
Out[38]:
exp A B C
first second
foo one -1.007742 2.594146 1.211697
two 1.280218 0.799940 0.039380
three -0.501615 -0.136437 0.997753
bar one -0.201222 0.060552 0.480552
two -0.758227 0.457597 -0.648014
baz two -0.326620 1.046366 -2.047380
three 0.395894 1.128850 -1.126649
qux one -0.353886 -1.200079 0.493888
two -0.124532 0.114733 1.991793
three -1.042094 1.079344 -0.153037
Simulate the reordering by doing a sort on the second level
In [39]: idx = df.sortlevel(level='second').index
In [40]: idx
Out[40]:
MultiIndex
[(u'foo', u'one'), (u'bar', u'one'), (u'qux', u'one'), (u'foo', u'two'), (u'bar', u'two'), (u'baz', u'two'), (u'qux', u'two'), (u'foo', u'three'), (u'baz', u'three'), (u'qux', u'three')]
In [41]: df.reindex(idx)
Out[41]:
exp A B C
first second
foo one -1.007742 2.594146 1.211697
bar one -0.201222 0.060552 0.480552
qux one -0.353886 -1.200079 0.493888
foo two 1.280218 0.799940 0.039380
bar two -0.758227 0.457597 -0.648014
baz two -0.326620 1.046366 -2.047380
qux two -0.124532 0.114733 1.991793
foo three -0.501615 -0.136437 0.997753
baz three 0.395894 1.128850 -1.126649
qux three -1.042094 1.079344 -0.153037
A different ordering
In [42]: idx = idx[5:] + idx[:5]
In [43]: idx
Out[43]:
MultiIndex
[(u'bar', u'one'), (u'bar', u'two'), (u'baz', u'three'), (u'baz', u'two'), (u'foo', u'one'), (u'foo', u'three'), (u'foo', u'two'), (u'qux', u'one'), (u'qux', u'three'), (u'qux', u'two')]
In [44]: df.reindex(idx)
Out[44]:
exp A B C
first second
bar one -0.201222 0.060552 0.480552
two -0.758227 0.457597 -0.648014
baz three 0.395894 1.128850 -1.126649
two -0.326620 1.046366 -2.047380
foo one -1.007742 2.594146 1.211697
three -0.501615 -0.136437 0.997753
two 1.280218 0.799940 0.039380
qux one -0.353886 -1.200079 0.493888
three -1.042094 1.079344 -0.153037
two -0.124532 0.114733 1.991793
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With