Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to apply hierarchy or multi-index to pandas columns

I have seen lots of examples on how to arrange dataframe row indexes hierarchically, but I am trying to do the same for columns and am not understanding the syntax:

Given:

df = pd.DataFrame(np.random.randn(10,10),
                  columns=['consumption', 'voltage', 'consumption', 
                           'voltage', 'temperature', 'humidity', 'consumption', 
                           'voltage','temperature','humidity'], 
                  index= pd.date_range('20000103',periods=10))

>>> df
            consumption   voltage  consumption   voltage  temperature  \
2000-01-03    -1.327735 -1.440285     0.317122 -1.120105     1.736651   
2000-01-04     0.132531  0.646972     2.296734  0.332154    -0.541792   
2000-01-05     0.127623  0.592778     0.162096  0.107398    -0.628785   
2000-01-06    -1.441151  0.215424     0.021068  0.683085    -0.783994   
2000-01-07    -0.157848  1.566780     0.599017 -0.628216     0.500251   
2000-01-08    -0.498926  0.338771     0.400159  1.571975     0.255635   
2000-01-09     0.516618 -1.936360     0.199388 -0.110415     2.690859   
2000-01-10    -0.779012 -1.310022    -1.207503  0.095679    -0.134244   
2000-01-11     0.644262  0.068196     1.041745 -0.444408    -0.751595   
2000-01-12    -0.608046  0.506588    -1.003893  0.473716     0.211991   

            humidity  consumption   voltage  temperature  humidity  
2000-01-03  0.039869     1.875807  0.129065     0.132419  0.572678  
2000-01-04  1.997363     0.543881 -1.235036     1.155389  1.282912  
2000-01-05 -0.458992     0.371589  0.698094     0.695067 -1.095875  
2000-01-06  2.512991     0.795234  1.220327    -0.688820  0.875705  
2000-01-07  0.263855    -1.253786 -0.308674     1.000057  1.474928  
2000-01-08 -0.614560    -0.398284  1.307488    -0.002438  1.572630  
2000-01-09  0.363889     2.571522  1.048124     2.574866 -0.417247  
2000-01-10 -0.125377     1.004011  1.312716    -2.036689  0.557569  
2000-01-11 -0.818585    -0.595743  1.106869    -2.226666 -0.679508  
2000-01-12  0.705707    -0.959365  0.689911     0.498411 -0.353557 

What I would like to do is add a hierarchical index or even something akin to a tag to the columns, so that they looked something like this:

BUILDING        1
DEVICETYPE      METER                  METER                  WEATHER
DEVICEID        A                      B                      A
FIELD           consumption   voltage  consumption   voltage  temperature  \
    2000-01-03    -1.327735 -1.440285     0.317122 -1.120105     1.736651   
    2000-01-04     0.132531  0.646972     2.296734  0.332154    -0.541792   
    2000-01-05     0.127623  0.592778     0.162096  0.107398    -0.628785   
    2000-01-06    -1.441151  0.215424     0.021068  0.683085    -0.783994   
    2000-01-07    -0.157848  1.566780     0.599017 -0.628216     0.500251   
    2000-01-08    -0.498926  0.338771     0.400159  1.571975     0.255635   
    2000-01-09     0.516618 -1.936360     0.199388 -0.110415     2.690859   
    2000-01-10    -0.779012 -1.310022    -1.207503  0.095679    -0.134244   
    2000-01-11     0.644262  0.068196     1.041745 -0.444408    -0.751595   
    2000-01-12    -0.608046  0.506588    -1.003893  0.473716     0.211991   

BUILDING                  2
DEVICETYPE                METER                  WEATHER
DEVICEID                  A                      A
                humidity  consumption   voltage  temperature  humidity  
    2000-01-03  0.039869     1.875807  0.129065     0.132419  0.572678  
    2000-01-04  1.997363     0.543881 -1.235036     1.155389  1.282912  
    2000-01-05 -0.458992     0.371589  0.698094     0.695067 -1.095875  
    2000-01-06  2.512991     0.795234  1.220327    -0.688820  0.875705  
    2000-01-07  0.263855    -1.253786 -0.308674     1.000057  1.474928  
    2000-01-08 -0.614560    -0.398284  1.307488    -0.002438  1.572630  
    2000-01-09  0.363889     2.571522  1.048124     2.574866 -0.417247  
    2000-01-10 -0.125377     1.004011  1.312716    -2.036689  0.557569  
    2000-01-11 -0.818585    -0.595743  1.106869    -2.226666 -0.679508  
    2000-01-12  0.705707    -0.959365  0.689911     0.498411 -0.353557 
like image 956
riddley_w Avatar asked Jan 16 '14 01:01

riddley_w


1 Answers

You can define MultiIndices using the from_arrays or from_tuples or from_product classmethods. Here is an example using from_arrays:

arrays = [[1, 2]*3, ['A', 'B', 'C']*2]
columns = pd.MultiIndex.from_arrays(arrays, names=['foo', 'bar'])

df = pd.DataFrame(np.random.randn(2,6),
                  columns=columns,
                  index= pd.date_range('20000103',periods=2))

yields

In [81]: df
Out[81]: 
foo                1         2         1         2         1         2
bar                A         B         C         A         B         C
2000-01-03  1.277234 -0.899547  0.040337 -0.878752 -0.524336  0.922440
2000-01-04 -1.706797  0.450379  1.510868 -2.539827 -1.909996 -0.003851

Defining a MultiIndex for the index is done in exactly the same way as for columns.

like image 182
unutbu Avatar answered Sep 23 '22 02:09

unutbu