I have an excel file foo.xlsx
with about 40 sheets sh1
, sh2
, etc. Each sheet has the format:
area cnt name\nparty1 name\nparty2 blah 9 5 5 word 3 7 5
In each sheet I want to rename the vars with the format name\nparty
to only have the party
as a label. Example output:
area cnt party1 party2 sheet bacon 9 5 5 sh1 spam 3 7 5 sh1 eggs 2 18 4 sh2
I am reading in the file with:
book = pd.ExcelFile(path)
And then wondering if I need to do:
for f in filelist: df = pd.ExcelFile.parse(book,sheetname=??) 'more operations here' # only change column names 2 and 3 i, col in enumerate(df): if i>=2 and i<=3: new_col_name = col.split("\n")[-1] df[new_col_name] =
Or something like that?
On the Data tab, under Tools, click Consolidate. In the Function box, click the function that you want Excel to use to consolidate the data. In each source sheet, select your data, and then click Add. The file path is entered in All references.
The openpyxl module allows a Python program to read and modify Excel files. We will be using this excel worksheet in the below examples: Approach #1: We will create an object of openpyxl, and then we'll iterate through all rows from top to bottom.
The read_excel
method of pandas
lets you read all sheets in at once if you set the keyword parameter sheet_name=None
(in some older versions of pandas
this was called sheetname
). This returns a dictionary - the keys are the sheet names, and the values are the sheets as dataframes.
Using this, we can simply loop through the dictionary and:
rename
method to rename our columns - by using a lambda
, we simply take the final entry of the list obtained by splitting each column name any time there is a new line. If there is no new line, the column name is unchanged.Once this is done, we combine all the sheets into one with pd.concat
. Then we reset the index and all should be well. Note: if you have parties present on one sheet but not others, this will still work but will fill any missing columns for each sheet with NaN
.
import pandas as pd sheets_dict = pd.read_excel('Book1.xlsx', sheet_name=None) all_sheets = [] for name, sheet in sheets_dict.items(): sheet['sheet'] = name sheet = sheet.rename(columns=lambda x: x.split('\n')[-1]) all_sheets.append(sheet) full_table = pd.concat(all_sheets) full_table.reset_index(inplace=True, drop=True) print(full_table)
Prints:
area cnt party1 party2 sheet 0 bacon 9 5 5 Sheet1 1 spam 3 7 5 Sheet1 2 eggs 2 18 4 Sheet2
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With