I am trying to read an Excel file using pandas but I am not sure if I am able to read the way I need.
My file is like this:

I am reading the file like this:
excel_file = pd.ExcelFile('MY_FILE')
df = excel_file.parse(sheet_name=0, header=1)
This way I am able to read but I am not sure from which group each variable belongs. In this case, for each column I need to know from which group they are coming from. Is there any way to do this?
Thank you!
Here is possible specify first and second row in parameter header for MultiIndex in columns and index_col for index from first column in function read_excel:
df = pd.read_excel('file.xlsx', header=[0,1], index_col=[0], sheet_name=0)
Your solution should be changed with same parameters:
excel_file = pd.ExcelFile('file.xlsx')
df = excel_file.parse(header=[0,1], index_col=[0], sheet_name=0)
print (df)
CUSTOM NAME g1 g2
NAME A B A B C
NAME 1 1.0 NaN 1 NaN 1.0
NAME 1 NaN 1.0 1 1.0 NaN
print (df.columns)
MultiIndex(levels=[['g1', 'g2'], ['A', 'B', 'C']],
codes=[[0, 0, 1, 1, 1], [0, 1, 0, 1, 2]],
names=['CUSTOM NAME', 'NAME'])
print (df.index)
Index(['NAME 1', 'NAME 1'], dtype='object')
Filtering working with tuples for select columns of MultiIndex:
print (df[df[('g1', 'A')] == 1])
CUSTOM NAME g1 g2
NAME A B A B C
NAME 1 1.0 NaN 1 NaN 1.0
More information in Select rows in pandas MultiIndex DataFrame, only remove loc, because MultiIndex in columns.
You can use MultiIndex if you pass a list of integers to header:
excel_file = pd.ExcelFile('example.xlsx')
df = excel_file.parse(sheet_name=0, header=[0,1])
the dataframe:
CUSTOM NAME GROUP 1 GROUP 2
NAME A B A B C
NAME 1 1.0 NaN 1 NaN 1.0
NAME 2 NaN 1.0 1 1.0 NaN
Documentation: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_excel.html
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With