Can Pandas read and modify a single Excel file worksheet (tab) without modifying the rest of the file?

Tags:

Many spreadsheets have formulas and formatting that Python tools for reading and writing Excel files cannot faithfully reproduce. That means that any file I want to create programmatically must be something I basically create from scratch, and then other Excel files (with the aforementioned sophistication) have to refer to that file (which creates a variety of other dependency issues).

My understanding of Excel file 'tabs' is that they're actually just a collection of XML files. Well, is it possible to use pandas (or one of the underlying read/write engines such as xlsxwriter or openpyxl to modify just one of the tabs, leaving other tabs (with more wicked stuff in there) intact?

EDIT: I'll try to further articulate the problem with an example.

Excel Sheet test.xlsx has four tabs (aka worksheets): Sheet1, Sheet2, Sheet3, Sheet4
I read Sheet3 into a DataFrame (let's call it df) using pandas.read_excel()
Sheet1 and Sheet2 contain formulas, graphs, and various formatting that neither openpyxl nor xlrd can successfully parse, and Sheet4 contains other data. I don't want to touch those tabs at all.
Sheet2 actually has some references to cells on Sheet3
I make some edits to df and now want to write it back to sheet3, leaving the other sheets untouched (and the references to it from other worksheets in the workbook intact)

Can I do that and, if so, how?

414

asked Jan 25 '15 22:01

HaPsantran

2 Answers

I had a similar question regarding the interaction between excel and python (in particular, pandas), and I was referred to this question.

Thanks to some pointers by stackoverflow community, I found a package called xlwings that seems to cover a lot of the functionalities HaPsantran required.

To use the OP's example:

Working with an existing excel file, you can drop an anchor in the data block (Sheet3) you want to import to pandas by naming it in excel and do:

# opened an existing excel file

wb = Workbook(Existing_file)

# Find in the excel file a named cell and reach the boundary of the cell block (boundary defined by empty column / row) and read the cell

df = Range(Anchor).table.value

# import pandas and manipulate the data block
df = pd.DataFrame(df) # into Pandas DataFrame
df['sum'] = df.sum(axis= 1)

# write back to Sheet3
Range(Anchor).value = df.values

tested that this implementation didn't temper existing formula in the excel file

Let me know if this solves your problem and if there's anything I can help.

Big kudos to the developer of xlwings, they made this possible.

Below is an update to my earlier answer after further question from @jamzsabb, and to reflect a changed API after xlwings updated to >= 0.9.0.

import xlwings as xw
import pandas as pd
target_df = xw.Range('A7').options(pd.DataFrame, expand='table').value # only do this if the 'A7' cell (the cell within area of interest) is in active worksheet
#otherwise do:
#sht = xw.Book(r'path to your xlxs file\name_of_file.xlsx`).sheets['name of sheet']
#target_df = sht.Range('A7').options(pd.DataFrame, expand='table').value # you can also change 'A7' to any name that you've given to a cell like 'interest_table`

116

answered Sep 30 '22 02:09

PaulDong

I'm 90% confident the answer to "can pandas do this" is no. Posting a negative is tough, because there always might be something clever that I've missed, but here's a case:

Possible interface engines are xlrd/xlwt/xlutils, openpyxl, and xlsxwriter. None will work for your purposes, as xlrd/wt don't support all formulae, xlsxwriter can't modify existing xlsx files, and openpyxl loses images and charts.

Since I often need to do this, I've taken to only writing simple output to a separate file and then calling the win32api directly to copy the data between the workbooks while preserving all of my colleague's shiny figures. It's annoying, because it means I have to do it under Windows instead of *nix, but it works.

If you're working under Windows, you could do something similar. (I wonder if it makes sense to add a native insert option using this approach to help people in this situation, or if we should simply post a recipe.)

P.S.: This very problem has annoyed me enough from time to time that I've thought of learning enough of the modern Excel format to add support for this to one of the libraries.

P.P.S.: But since ignoring things you're not handling and returning them unmodified seems easy enough, the fact that no one seems to support it makes me think there are some headaches, and where Redmond's involved I'm willing to believe it. @john-machin would know the details, if he's about..

answered Sep 30 '22 04:09

DSM

Related questions
                            
                                Class attribute shadowing in Python class [duplicate]
                            
                                Logging with WSGI server and flask application
                            
                                np.ndarray with Periodic Boundary conditions
                            
                                What's the order of __hash__ and __eq__ evaluation for a Python dict?
                            
                                Persist authenticated session between crawls for development in Scrapy
                            
                                Calculate F-distribution p values in python?
                            
                                Pycharm debugger, view as array option
                            
                                pandas append same series to each column
                            
                                Using groupby ("1d") and first_valid_index together
                            
                                What are the best practices for sharing a module among subpackages?
                            
                                python socket.error: [Errno 9] Bad file descriptor
                            
                                Scope of Spark's `persist` or `cache`
                            
                                Qcut Pandas : ValueError: Bin edges must be unique
                            
                                Requiring only one of two dependencies in a requirements file
                            
                                Assign/map colors to the points in Seaborn.regplot (Python 3)
                            
                                Are there anything similar to "perl -pe" option in python?
                            
                                Looking for an efficient way to store history data
                            
                                unable to install python from source
                            
                                Using Popen in a thread blocks every incoming Flask-SocketIO request
                            
                                Python enumerate reverse index only

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Can Pandas read and modify a single Excel file worksheet (tab) without modifying the rest of the file?

Tags:

python

pandas

excel

HaPsantran

People also ask

2 Answers

tested that this implementation didn't temper existing formula in the excel file

PaulDong

DSM

Recent Activity

Donate For Us