I have seen lots of examples on how to arrange dataframe row indexes hierarchically, but I am trying to do the same for columns and am not understanding the syntax:
Given:
df = pd.DataFrame(np.random.randn(10,10),
columns=['consumption', 'voltage', 'consumption',
'voltage', 'temperature', 'humidity', 'consumption',
'voltage','temperature','humidity'],
index= pd.date_range('20000103',periods=10))
>>> df
consumption voltage consumption voltage temperature \
2000-01-03 -1.327735 -1.440285 0.317122 -1.120105 1.736651
2000-01-04 0.132531 0.646972 2.296734 0.332154 -0.541792
2000-01-05 0.127623 0.592778 0.162096 0.107398 -0.628785
2000-01-06 -1.441151 0.215424 0.021068 0.683085 -0.783994
2000-01-07 -0.157848 1.566780 0.599017 -0.628216 0.500251
2000-01-08 -0.498926 0.338771 0.400159 1.571975 0.255635
2000-01-09 0.516618 -1.936360 0.199388 -0.110415 2.690859
2000-01-10 -0.779012 -1.310022 -1.207503 0.095679 -0.134244
2000-01-11 0.644262 0.068196 1.041745 -0.444408 -0.751595
2000-01-12 -0.608046 0.506588 -1.003893 0.473716 0.211991
humidity consumption voltage temperature humidity
2000-01-03 0.039869 1.875807 0.129065 0.132419 0.572678
2000-01-04 1.997363 0.543881 -1.235036 1.155389 1.282912
2000-01-05 -0.458992 0.371589 0.698094 0.695067 -1.095875
2000-01-06 2.512991 0.795234 1.220327 -0.688820 0.875705
2000-01-07 0.263855 -1.253786 -0.308674 1.000057 1.474928
2000-01-08 -0.614560 -0.398284 1.307488 -0.002438 1.572630
2000-01-09 0.363889 2.571522 1.048124 2.574866 -0.417247
2000-01-10 -0.125377 1.004011 1.312716 -2.036689 0.557569
2000-01-11 -0.818585 -0.595743 1.106869 -2.226666 -0.679508
2000-01-12 0.705707 -0.959365 0.689911 0.498411 -0.353557
What I would like to do is add a hierarchical index or even something akin to a tag to the columns, so that they looked something like this:
BUILDING 1
DEVICETYPE METER METER WEATHER
DEVICEID A B A
FIELD consumption voltage consumption voltage temperature \
2000-01-03 -1.327735 -1.440285 0.317122 -1.120105 1.736651
2000-01-04 0.132531 0.646972 2.296734 0.332154 -0.541792
2000-01-05 0.127623 0.592778 0.162096 0.107398 -0.628785
2000-01-06 -1.441151 0.215424 0.021068 0.683085 -0.783994
2000-01-07 -0.157848 1.566780 0.599017 -0.628216 0.500251
2000-01-08 -0.498926 0.338771 0.400159 1.571975 0.255635
2000-01-09 0.516618 -1.936360 0.199388 -0.110415 2.690859
2000-01-10 -0.779012 -1.310022 -1.207503 0.095679 -0.134244
2000-01-11 0.644262 0.068196 1.041745 -0.444408 -0.751595
2000-01-12 -0.608046 0.506588 -1.003893 0.473716 0.211991
BUILDING 2
DEVICETYPE METER WEATHER
DEVICEID A A
humidity consumption voltage temperature humidity
2000-01-03 0.039869 1.875807 0.129065 0.132419 0.572678
2000-01-04 1.997363 0.543881 -1.235036 1.155389 1.282912
2000-01-05 -0.458992 0.371589 0.698094 0.695067 -1.095875
2000-01-06 2.512991 0.795234 1.220327 -0.688820 0.875705
2000-01-07 0.263855 -1.253786 -0.308674 1.000057 1.474928
2000-01-08 -0.614560 -0.398284 1.307488 -0.002438 1.572630
2000-01-09 0.363889 2.571522 1.048124 2.574866 -0.417247
2000-01-10 -0.125377 1.004011 1.312716 -2.036689 0.557569
2000-01-11 -0.818585 -0.595743 1.106869 -2.226666 -0.679508
2000-01-12 0.705707 -0.959365 0.689911 0.498411 -0.353557
You can define MultiIndices
using the from_arrays
or from_tuples
or from_product
classmethods. Here is an example using from_arrays
:
arrays = [[1, 2]*3, ['A', 'B', 'C']*2]
columns = pd.MultiIndex.from_arrays(arrays, names=['foo', 'bar'])
df = pd.DataFrame(np.random.randn(2,6),
columns=columns,
index= pd.date_range('20000103',periods=2))
yields
In [81]: df
Out[81]:
foo 1 2 1 2 1 2
bar A B C A B C
2000-01-03 1.277234 -0.899547 0.040337 -0.878752 -0.524336 0.922440
2000-01-04 -1.706797 0.450379 1.510868 -2.539827 -1.909996 -0.003851
Defining a MultiIndex for the index is done in exactly the same way as for columns.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With