Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Differences between xlwings vs openpyxl Reading Excel Workbooks

I've mostly only used xlwings to open (read-write) workbooks (since the workbooks I read have complicated macros). But I've recently begun using openpyxl to open (read-only) workbooks when I've needed to read thousands of workbooks to scrape some data.

I've noticed that there is a considerable difference between how xlwings and openpyxl read workbooks. I believe xlwings relies on pywin32 to read workbooks. When you read a workbook with xlwings.Book(<filename>) the actual workbook opens up. I have a feeling this is a result of pywin32.

However, when using openpyxl.load_workbook(<filename>) a workbook window does not appear. I have a feeling this is a result of not using pywin32.

Beyond this, I've no further understanding how the backends work for each libraries. Could someone shine some light on this? Is there a benefit/cost to relying on xlwings and pywin32 for reading workbooks, as opposed to openpyxl which does not seem to use pywin32?

like image 647
Jon Avatar asked Oct 10 '19 18:10

Jon


People also ask

Is openpyxl better than xlwings?

xlwings is the better choice if you want to split the design and code work. XlsxWriter/OpenPyxl is the better choice if it needs to be scalable and run on a server. If you need to generate PDF files at high speed, check out ReportLab.

Which is better openpyxl or xlrd?

Xlrd is only read the data by using columns and rows. It is impossible in xlrd to read the data using excel format. Python xlrd is allowing us to slice data. Openpyxl allows us data by using ranges in the format of excel; also, openpyxl allows the data by using slices.

When should I use Pandas vs openpyxl?

Developers describe openpyxl as "A Python library to read/write Excel 2010 xlsx/xlsm files". A Python library to read/write Excel 2010 xlsx/xlsm files. On the other hand, pandas is detailed as "Powerful data structures for data analysis".

Which is better openpyxl or XlsxWriter?

If you are working with large files or are particularly concerned about speed then you may find XlsxWriter a better choice than OpenPyXL. XlsxWriter is a Python module that can be used to write text, numbers, formulas and hyperlinks to multiple worksheets in an Excel 2007+ XLSX file.


1 Answers

You are correct in that xlwings relies on pywin32, whereas openpyxl does not.

openpyxl

A ".xlsx" excel file is essentially a zip-file containing multiple XML files formatted according to Microsoft's OOXML specification. With this specification it's possible to create a program capable of directly reading/writing excel files in just about any programming language. This is the approach applied in openpyxl: it uses python code to read/write excel files directly.

xlwings

A Microsoft Excel application can be started and controlled by an external program through the Win32 COM API. The pywin32 package provides an interface between Win32 COM and Python. Through a python script with the right pywin32 commands you can fully control an Excel Application (open excel files, query data from cells, write data to cells, save excel files, etc.). The pywin32 commands that you can use mirror the Excel VBA commands, albeit with python syntax.

xlwings is (among other things) a user-friendly wrapper around pywin32. It introduces several concise-yet-powerful methods. An example would be the methods for direct conversion of an excel cell range to a numpy array or pandas dataframe (and vice versa).

Summary

A fundamental difference between xlwings and openpyxl is that the former requires that MS Excel is installed on your machine, whereas the latter does not.

like image 165
Xukrao Avatar answered Oct 06 '22 16:10

Xukrao