Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Check workbook for sheet and add if missing

I am trying to simply check if a sheet exists in an .xlsx file and if not I want to add it.

book = load_workbook('test.xlsx')
writer = pd.ExcelWriter('test.xlsx', engine = 'openpyxl')
writer.book = book

if 'testSheet' in book.sheetnames:
    pass
else:
    book.add_sheet(book['testSheet'])

Any ideas as to why this doesn't work?

like image 504
MaxB Avatar asked Nov 27 '22 19:11

MaxB


1 Answers

If you are only working with Excel files with extension *.xlsx, then openpyxl has useful features that allow you to create, access, rename, add/remove data to/from Excel worksheets. While it may seem rather straightforward to access a workbook's worksheet with openpyxl, making use of Python's exception handling can help catch errors when the worksheet does not exist in the first place. Consider the example below where a KeyError is raised if a worksheet called "invalidSheet" does not exist for the workbook "test.xlsx". The try/except block's job is to raise an exception if the worksheet does not exist. The purpose of this simple example is to only identify the type of exception that openpyxl raises.

In [1]: import openpyxl

In [2]: book = openpyxl.load_workbook("test.xlsx")

In [3]: try:
   ...:     ws = book["invalidSheet"]  #try to access a non-existent worksheet
   ...: except:
   ...:     raise
   ...:
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-3-4f4ac71a4f19> in <module>
      1 try:
----> 2     ws = book["invalidSheet"]
      3 except:
      4     raise
      5

~\Anaconda3\lib\site-packages\openpyxl\workbook\workbook.py in __getitem__(self, key)
    275             if sheet.title == key:
    276                 return sheet
--> 277         raise KeyError("Worksheet {0} does not exist.".format(key))
    278
    279     def __delitem__(self, key):

KeyError: 'Worksheet invalidSheet does not exist.'

This helps us form a more explicit try/except block to catch for non-existent sheets. In a short while, we will improve upon this example, but first let's find out the sheetnames in this Excel spreadsheet. We use the sheetnames attribute of the Workbook object book that we created earlier:

In [15]: book.sheetnames
Out[15]: ['testSheet1', 'testSheet2']

In [16]: type(book.sheetnames)
Out[16]: list

This returns a list of sheetnames. We will use this information to verify sheetnames later. Resuming back to the above example, the following improved version catches the KeyError for non-existent sheets and creates a new sheet if it does not exist. However, the sheets will not appear in the actual Excel spreadsheet unless we save() it. The sheet names on the other hand will still be updated for the object. You can verify this after executing the snippet:

In [20]: try:
    ...:     filename = "test.xlsx"
    ...:     sheet_name = "invalidSheet"
    ...:     ws = book[sheet_name]
    ...: except KeyError:
    ...:     print("The worksheet '{}' does not exist for workbook '{}'. Creating one...".format(
    ...:                                                                                         sheet_name,
    ...:                                                                                         filename))
    ...:     book.create_sheet(sheet_name)
    ...:     print("Worksheet '{}' created successfully for workbook '{}'.".format(sheet_name, filename))
    ...:
The worksheet 'invalidSheet' does not exist for workbook 'test.xlsx'. Creating one...
Worksheet 'invalidSheet' created successfully for workbook 'test.xlsx'.

In [21]: book.sheetnames
Out[21]: ['testSheet1', 'testSheet2', 'invalidSheet']

So now that we have the sheet "invalidSheet" added, let's add some data and save it with a more meaningful name. Openpyxl also provides Pandas dataframe support. We will first create a dataframe and then append each row in the dataframe(including the header) to the worksheet using the dataframe_to_rows() method, then rename the worksheet and finally save it.

In [23]: import pandas as pd

In [24]: df = pd.DataFrame({"Name": ["John", "Val", "Katie"], 
                           "Favorite Pet":["dog", "cat", "guinea pig"]})   #create dataframe

In [25]: df
Out[25]:
    Name Favorite Pet
0   John          dog
1    Val          cat
2  Katie   guinea pig

In [26]: from openpyxl.utils.dataframe import dataframe_to_rows #import method

In [27]: ws = book["invalidSheet"] #create a worksheet object for the existing sheet "invalidSheet"

In [29]: for r in dataframe_to_rows(df, index=False, header=True):
    ...:     ws.append(r)    #append each df row to the worksheet
    ...:                                    
In [31]: ws['A2'].value    #verify value at cell 'A2'. Remember, the first row will be the header
Out[31]: 'John'

In [32]: ws.title = "favPetSheet" #rename the worksheet

In [33]: book.sheetnames  #verify whether the sheet was added & renamed
Out[33]: ['testSheet1', 'testSheet2', 'favPetSheet']

In [35]: book.save("test.xlsx")  #save the workbook

In an ideal situation, a single function should perform all these tasks for a certain workbook & worksheet of the workbook and dataframe.

In [45]: def check_sheet_add_data(filename, sheetname, df):
    ...:     """Check if sheet exists for an xlsx spreadsheet and add data from dataframe to the sheet
    ...:        :param: filename - The filename of the xlsx spreadsheet
    ...:        :param: sheetname - Name of the worksheet to search for
    ...:        :param: df - A Pandas dataframe object"""
    ...:
    ...:     wb = openpyxl.load_workbook(filename)
    ...:     try:
    ...:         ws = wb[sheetname]
    ...:         print("Sheet '{}' found in workbook '{}'".format(sheetname, filename))
    ...:     except KeyError:
    ...:         print("Worksheet '{}' not found for workbook '{}'.Adding...".format(sheetname, filename))
    ...:         wb.create_sheet(sheetname)
    ...:         ws = wb[sheetname]
    ...:         print()
    ...:         print("Current sheetnames: {}".format(wb.sheetnames))
    ...:         print()
    ...:         print("Worksheet '{}' added successfully for workbook '{}'".format(sheetname, filename))
    ...:     finally:
    ...:         print()
    ...:         print("Adding data to worksheet '{}'...".format(sheetname))
    ...:         print()
    ...:         for r in dataframe_to_rows(df, index=False, header=True):
    ...:             ws.append(r)
    ...:         wb.save(filename)
    ...:         print("Workbook '{}' saved successfully.".format(filename))
    ...:         print()
    ...:         print("***End***")

With this function ready, let's test all conditions. First let's add some new data, say "Favorite Albums" for our old friends John, Val and Katie.

In [39]: df2 = pd.DataFrame({"Name":["John", "Val", "Katie"], 
                         "Favorite Album": ["Thriller", "Stairway to Heaven", "Abbey Road"]})

In [40]: df2
Out[40]:
    Name      Favorite Album
0   John            Thriller
1    Val  Stairway to Heaven
2  Katie          Abbey Road

Our workbook will be the same "test.xlsx" and our new worksheet will be called "favAlbumSheet". Testing on all conditions for existing and non-existent worksheets:

#Condition 1: Worksheet does not exist
In [44]: check_sheet_add_data(filename="test.xlsx", sheetname="favAlbumSheet", df=df2)
Worksheet 'favAlbumSheet' not found for workbook 'test.xlsx'.Adding...

Current sheetnames: ['testSheet1', 'testSheet2', 'favPetSheet', 'favAlbumSheet']

Worksheet 'favAlbumSheet' added successfully for workbook 'test.xlsx'

Adding data to worksheet 'favAlbumSheet'...

Workbook 'test.xlsx' saved successfully.

***End***

#Condition 2: Worksheet exists
In [46]: check_sheet_add_data(filename="test.xlsx", sheetname="favAlbumSheet", df=df2)
Sheet 'favAlbumSheet' found in workbook 'test.xlsx'

Adding data to worksheet 'favAlbumSheet'...

Workbook 'test.xlsx' saved successfully.

***End***

We made use of Openpyxl's easy to use features for accessing worksheets in a valid Excel Workbook and add data from dataframes to worksheets. With Python's exception handling, we were able to clearly identify the presence of a worksheet(for a valid workbook) and add one when necessary. The function can be further extended to catch other errors like invalid filename(FileNotFoundError), invalid dataframe object etc. If you don't want to add data every time and only check for the existence of the sheet, make df an optional argument: df=None and only save the workbook without appending any data to the worksheet, in the finally block.

like image 74
amanb Avatar answered Dec 14 '22 17:12

amanb