Read excel sheet with multiple header using Pandas

Tags:

I have an excel sheet with multiple header like:

_________________________________________________________________________
____|_____|        Header1    |        Header2     |        Header3      |
ColX|ColY |ColA|ColB|ColC|ColD||ColD|ColE|ColF|ColG||ColH|ColI|ColJ|ColDK|
1   | ds  | 5  | 6  |9   |10  | .......................................
2   | dh  |  ..........................................................
3   | ge  |  ..........................................................
4   | ew  |  ..........................................................
5   | er  |  ..........................................................

Now here you can see that first two columns do not have headers they are blank but other columns have headers like Header1, Header2 and Header3. So I want to read this sheet and merge it with other sheet with similar structure.

I want to merge it on first column 'ColX'. Right now I am doing this:

Click to copy

import pandas as pd

totalMergedSheet = pd.DataFrame([1,2,3,4,5], columns=['ColX'])
file = pd.ExcelFile('ExcelFile.xlsx')
for i in range (1, len(file.sheet_names)):
    df1 = file.parse(file.sheet_names[i-1])
    df2 = file.parse(file.sheet_names[i])
    newMergedSheet = pd.merge(df1, df2, on='ColX')
    totalMergedSheet = pd.merge(totalMergedSheet, newMergedSheet, on='ColX')

But I don't know its neither reading columns correctly and I think will not return the results in the way I want. So, I want the resulting frame should be like:

Click to copy

________________________________________________________________________________________________________
____|_____|        Header1    |        Header2     |        Header3      |        Header4     |        Header5      |
ColX|ColY |ColA|ColB|ColC|ColD||ColD|ColE|ColF|ColG||ColH|ColI|ColJ|ColK| ColL|ColM|ColN|ColO||ColP|ColQ|ColR|ColS|
1   | ds  | 5  | 6  |9   |10  | ..................................................................................
2   | dh  |  ...................................................................................
3   | ge  |  ....................................................................................
4   | ew  |  ...................................................................................
5   | er  |  ......................................................................................

Any suggestions please. Thanks.

493

asked Nov 11 '16 18:11

muazfaiz

1 Answers

[See comments for updates and corrections]

Pandas already has a function that will read in an entire Excel spreadsheet for you, so you don't need to manually parse/merge each sheet. Take a look pandas.read_excel(). It not only lets you read in an Excel file in a single line, it also provides options to help solve the problem you're having.

Since you have subcolumns, what you're looking for is MultiIndexing. By default, pandas will read in the top row as the sole header row. You can pass a header argument into pandas.read_excel() that indicates how many rows are to be used as headers. In your particular case, you'd want header=[0, 1], indicating the first two rows. You might also have multiple sheets, so you can pass sheetname=None as well (this tells it to go through all sheets). The command would be:

Click to copy

df_dict = pandas.read_excel('ExcelFile.xlsx', header=[0, 1], sheetname=None)

This returns a dictionary where the keys are the sheet names, and the values are the DataFrames for each sheet. If you want to collapse it all into one DataFrame, you can simply use pandas.concat:

Click to copy

df = pandas.concat(df_dict.values(), axis=0)

answered Oct 04 '22 15:10

beeftendon

Related questions
                            
                                why would a django test fail only when the full test suite is run?
                            
                                Call Python code from an existing project written in Swift
                            
                                Change the number of request retries in boto3
                            
                                pandas add column to groupby dataframe
                            
                                Writing Dask partitions into single file
                            
                                Loop over results from Path.glob() (Pathlib) [duplicate]
                            
                                How to disable printing reports after each epoch in Keras?
                            
                                How to hide hover tooltips on Spyder 4
                            
                                numpy array with dtype Decimal?
                            
                                Alternative to python string item assignment
                            
                                Controlling scheduling priority of python threads?
                            
                                convert binary string to numpy array
                            
                                TypeError: only integer arrays with one element can be converted to an index
                            
                                How to create a copy of a python function [duplicate]
                            
                                Extract bounding box and save it as an image
                            
                                Open tor browser with selenium
                            
                                PEP 257 docstring trim in standard library?
                            
                                How to avoid floating point errors? [duplicate]
                            
                                Python's in (__contains__) operator returns a bool whose value is neither True nor False
                            
                                Pandas merge giving error "Buffer has wrong number of dimensions (expected 1, got 2)"

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Read excel sheet with multiple header using Pandas

Tags:

python

pandas

dataframe

excel

muazfaiz

People also ask

1 Answers

beeftendon

Recent Activity

Donate For Us