I have a time series excel file with a tri-level column MultiIndex that I would like to successfully parse if possible. There are some results on how to do this for an index on stack overflow but not the columns and the parse function has a header that does not seem to take a list of rows.
The ExcelFile looks like is like the following:
So there are two low_level values many mid_level values and a few top_level values but the trick is the top and mid level values are null and are assumed to be the values to the left. So, for instance all the columns above would have top_level1 as the top multi-index value.
My best idea so far is to use transpose, but the it fills Unnamed: # everywhere and doesn't seem to work. In Pandas 0.13 read_csv seems to have a header parameter that can take a list, but this doesn't seem to work with parse.
You can fillna the null values. I don't have your file, but you can test
#Headers as rows for now
df = pd.read_excel(xls_file,0, header=None, index_col=0)
#fill in Null values in "Headers"
df = df.fillna(method='ffill', axis=1)
#create multiindex column names
df.columns=pd.MultiIndex.from_arrays(df[:3].values, names=['top','mid','low'])
#Just name of index
df.index.name='Date'
#remove 3 rows which are already used as column names
df = df[pd.notnull(df.index)]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With