Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas Combine Excel Spreadsheets

Tags:

python

excel

I have an Excel workbook with many tabs. Each tab has the same set of headers as all others. I want to combine all of the data from each tab into one data frame (without repeating the headers for each tab).

So far, I've tried:

import pandas as pd
xl = pd.ExcelFile('file.xlsx')
df = xl.parse()

Can use something for the parse argument that will mean "all spreadsheets"? Or is this the wrong approach?

Thanks in advance!

Update: I tried:

a=xl.sheet_names
b = pd.DataFrame()
for i in a:
    b.append(xl.parse(i))
b

But it's not "working".

like image 522
Dance Party2 Avatar asked Mar 11 '16 21:03

Dance Party2


People also ask

How do I combine multiple Excel worksheets into one?

On the Data tab, under Tools, click Consolidate. In the Function box, click the function that you want Excel to use to consolidate the data. In each source sheet, select your data, and then click Add. The file path is entered in All references.

Can pandas read Excel file with multiple sheets?

To read an excel file as a DataFrame, use the pandas read_excel() method. You can read the first sheet, specific sheets, multiple sheets or all sheets.


1 Answers

This is one way to do it -- load all sheets into a dictionary of dataframes and then concatenate all the values in the dictionary into one dataframe.

import pandas as pd

Set sheetname to None in order to load all sheets into a dict of dataframes and ignore index to avoid overlapping values later (see comment by @bunji)

df = pd.read_excel('tmp.xlsx', sheet_name=None, index_col=None)

Then concatenate all dataframes

cdf = pd.concat(df.values())

print(cdf)
like image 101
daedalus Avatar answered Oct 04 '22 05:10

daedalus