Hi I have a DataFrame/Series with 2-level multi-index and one column. I would like to take the second-level index and use it as a column. For example (code taken from multi-index docs):
import pandas as pd
import numpy as np
arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'],
          ['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']]
tuples = list(zip(*arrays))
index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])
s = pd.DataFrame(np.random.randn(8), index=index, columns=["col"])
Which looks like:
first  second
bar    one      -0.982656
       two      -0.078237
baz    one      -0.345640
       two      -0.160661
foo    one      -0.605568
       two      -0.140384
qux    one       1.434702
       two      -1.065408
dtype: float64
What I would like is to have a DataFrame with index [bar, baz, foo, qux] and columns [one, two].
You just need to unstack your series:
>>> s.unstack(level=1)
second       one       two
first                     
bar    -0.713374  0.556993
baz     0.523611  0.328348
foo     0.338351 -0.571854
qux     0.036694 -0.161852
                        Here's a solution using array reshaping -
>>> idx = s.index.levels
>>> c = len(idx[1])
>>> pd.DataFrame(s.values.reshape(-1,c),index=idx[0].values, columns=idx[1].values)
          one       two
bar  2.225401  1.624866
baz  1.067359  0.349440
foo -0.468149 -0.352303
qux  1.215427  0.429146
If you don't care about the names appearing on top of indexes -
>>> pd.DataFrame(s.values.reshape(-1,c), index=idx[0], columns=idx[1])
second       one       two
first                     
bar     2.225401  1.624866
baz     1.067359  0.349440
foo    -0.468149 -0.352303
qux     1.215427  0.429146
Timings for the given dataset size -
# @AChampion's solution
In [201]: %timeit s.unstack(level=1)
1000 loops, best of 3: 444 µs per loop
# Using array reshaping step-1
In [199]: %timeit s.index.levels
1000000 loops, best of 3: 214 ns per loop
# Using array reshaping step-2    
In [202]: %timeit pd.DataFrame(s.values.reshape(-1,2), index=idx[0], columns=idx[1])
10000 loops, best of 3: 47.3 µs per loop
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With