I have an Excel workbook with many tabs. Each tab has the same set of headers as all others. I want to combine all of the data from each tab into one data frame (without repeating the headers for each tab).
So far, I've tried:
import pandas as pd
xl = pd.ExcelFile('file.xlsx')
df = xl.parse()
Can use something for the parse argument that will mean "all spreadsheets"? Or is this the wrong approach?
Thanks in advance!
Update: I tried:
a=xl.sheet_names
b = pd.DataFrame()
for i in a:
b.append(xl.parse(i))
b
But it's not "working".
On the Data tab, under Tools, click Consolidate. In the Function box, click the function that you want Excel to use to consolidate the data. In each source sheet, select your data, and then click Add. The file path is entered in All references.
To read an excel file as a DataFrame, use the pandas read_excel() method. You can read the first sheet, specific sheets, multiple sheets or all sheets.
This is one way to do it -- load all sheets into a dictionary of dataframes and then concatenate all the values in the dictionary into one dataframe.
import pandas as pd
Set sheetname to None in order to load all sheets into a dict of dataframes and ignore index to avoid overlapping values later (see comment by @bunji)
df = pd.read_excel('tmp.xlsx', sheet_name=None, index_col=None)
Then concatenate all dataframes
cdf = pd.concat(df.values())
print(cdf)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With