I'm brand new to pandas for python. I have a data file that has multiple row labels (per row) and column labels (per column) like the following data of observation counts for 3 different animals (dog,bat,ostrich) at multiple recording times (monday morning, day, night):
'' , '' , colLabel:name , dog , bat , Ostrich
'' , '' , colLabel:genus , Canis , Chiroptera , Struthio,
'' , '' , colLabel:activity, diurnal, nocturnal, diurnal
day , time of day, '' , , ,
Monday , morning , '' , 17 , 5 , 2
Monday , day , '' , 63 , 0 , 34
Monday , night , '' , 21 , 68 , 1
Friday , day , '' , 72 , 0 , 34
I'd like to read this data into Pandas where both the rows and columns are hierarchically organized. What is the best way of doing this?
You can use the header
, index_col
and tupleize_cols
arguments of read_csv
:
In [1]: df = pd.read_csv('foo.csv', header=[0, 1, 2], index_col=[0, 1], tupleize_cols=False, sep='\s*,\s+')
Note: in 0.13 tupelize=False
will be the default, so you won't need to use that.
There's a little bit of hacking required to get out the column level names:
In [2]: df.columns.names = df.columns[0]
In [3]: del df[df.columns[0]]
In [4]: df
Out[4]:
colLabel:name dog bat Ostrich
colLabel:genus Canis Chiroptera Struthio,
colLabel:activity diurnal nocturnal diurnal
day time of day
Monday morning 17 5 2
day 63 0 34
night 21 68 1
Friday day 72 0 34
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With