Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Creating big Excel sheets programmatically

We are using OpenPyxl to export MySQL content to Microsoft Excel in XSLX format

https://bitbucket.org/ericgazoni/openpyxl/overview

However, the amount of data we are dealing with is big. We are running to out of memory situation. Tables may contain up to 400 columns in 50000+ rows. Even the files are big, they are not that big that Microsoft Excel or OpenOffice should have problems with them. We are assuming our issues mainly stem from the fact that Python keeps XML DOM structure in memory in not efficient enough manner.

EDIT: Eric, the author of OpenPyxl, pointed out that there is an option to make OpenPyxl write with fixed memory usage. However, this didn't solve our problem completely, as we still have issues with raw speed and something else taking up too much memory in Python.

Now we are looking for more efficient ways to create Excel files. With Python preferably, but if we cannot find a good solution we might want to look other programming languages as well.

Options, not in any specific order, include

1) Using OpenOffice and PyUno and hope their memory structures are more efficient than with OpenPyxl and the TCP/IP call bridge is efficient enough

2) Openpyxl uses xml.etree. Would Python lxml (libxml2 native extension) be more efficient wit XML memory structures and is it possible to replace xml.etree directly with lxml drop-in e.g. with monkey-patching? (later the changes could be contributed back to Openpyxl if there is a clear benefit)

3) Export from MySQL to CSV and then post-process CSV files directly to XSLX using Python and file iteration

4) Use other programming languages and libraries (Java)

Pointers:

http://dev.lethain.com/handling-very-large-csv-and-xml-files-in-python/

http://enginoz.wordpress.com/2010/03/31/writing-xlsx-with-java/

like image 627
Mikko Ohtamaa Avatar asked Apr 18 '11 09:04

Mikko Ohtamaa


People also ask

How do I make a large Excel file?

To do this, go to Page Layout > Page Setup > Orientation, and click Landscape. Consider using a larger paper size to accommodate many columns. To switch the default paper size, go to Page Layout > Page Setup > Size, and then choose the size you want.

How much RAM does heavy Excel use?

For Excel 32-bit, the maximum memory limit, or RAM, is 2GB. Oracle testing showed that when Excel 32-bit memory usage reaches beyond 700 MB, Excel can behave abnormally and can terminate unexpectedly. The 64-bit versions of Excel are not constrained to 2 GB of RAM, as are the 32-bit Office applications.


2 Answers

If you're going to use Java, you will want to use Apache POI, but likely not the regular UserModel as you're wanting to keep your memory footprint down.

Instead, take a look at BigGridDemo, which shows you how to write a very large xlsx file using POI, with most of the work not happening in memory.

You might also find that the technique used in the BigGridDemo could equally be used in Python?

like image 101
Gagravarr Avatar answered Sep 30 '22 23:09

Gagravarr


Have you tried to look at the optimized writer for openpyxl ? It's a recent feature (2 months old), but it's quite robust (used in production in several corporate projects) and can handle almost indefinite amount of data with steady memory consumption (around 7Mb)

http://packages.python.org/openpyxl/optimized.html#optimized-writer

like image 31
Eric Gazoni Avatar answered Sep 30 '22 23:09

Eric Gazoni