Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: fastest way to write pandas DataFrame to Excel on multiple sheets

I need to export 24 pandas data frames ( 140 columns x 400 rows) to Excel, each into a different sheet.

I am using pandas’ built-in ExcelWriter. Running 24 scenarios, it takes:

51 seconds to write to an .xls file (using xlwt)

86 seconds to write to an .xlsx file (using XlsxWriter)

141 seconds to write to an .xlsm file (using openpyxl)

21 seconds to just run the program (no Excel output)

The problem with writing to .xls is that the spreadsheet contains no formatting styles, so if I open it in Excel, select a column, and click on the ‘comma’ button to format the numbers, it tells me: ‘style comma not found’. I don’t get this problem writing to an .xlsx, but that’s even slower.

Any suggestions on how to make the exporting faster? I can’t be the first one to have this problem, yet after hours of searching forums and websites I haven’t found any definite solution.

The only thing I can think of is to use Python to export to csv files, and then write an Excel macro to merge all the CSVs into a single spreadsheet.

The .xls file is 10 MB, and the .xlsx 5.2 MB

Thanks!

like image 255
Pythonista anonymous Avatar asked Sep 16 '14 07:09

Pythonista anonymous


People also ask

How do I write pandas DataFrame to multiple sheets in Excel?

If we want to write to multiple sheets, we need to create an ExcelWriter object with target filename and also need to specify the sheet in the file in which we have to write. The multiple sheets can also be written by specifying the unique sheet_name.

How do you write data to multiple sheets in Excel using Python?

To write to multiple sheets it is necessary to create an ExcelWriter object with a target file name, and specify a sheet in the file to write to. Multiple sheets may be written to by specifying unique sheet_name . With all data written to the file it is necessary to save the changes.

What is the fastest way to iterate over pandas DataFrame?

Vectorization is always the first and best choice. You can convert the data frame to NumPy array or into dictionary format to speed up the iteration workflow. Iterating through the key-value pair of dictionaries comes out to be the fastest way with around 280x times speed up for 20 million records.

Do you read Excel files with Python there is a 1000x faster way?

Importing csv files in Python is 100x faster than Excel files. We can now load these files in 0.63 seconds. That's nearly 10 times faster! Python loads CSV files 100 times faster than Excel files.


2 Answers

Here is a benchmark for different Python to Excel modules.

And here is the output for 140 columns x (400 x 24) rows using the latest version of the modules at the time of posting:

Versions:
    python      : 2.7.7
    openpyxl    : 2.0.5
    pyexcelerate: 0.6.3
    xlsxwriter  : 0.5.7
    xlwt        : 0.7.5

Dimensions:
    Rows = 9600 (400 x 24)
    Cols = 140

Times:
    pyexcelerate          :  11.85
    xlwt                  :  17.64
    xlsxwriter (optimised):  21.63
    xlsxwriter            :  26.76
    openpyxl   (optimised):  95.18
    openpyxl              : 119.29

As with any benchmark the results will depend on Python/module versions, CPU, RAM and Disk I/O and on the benchmark itself. So make sure to verify these results for your own setup.

Also, since you asked specifically about Pandas, please note that PyExcelerate isn't supported.

like image 149
jmcnamara Avatar answered Sep 20 '22 04:09

jmcnamara


For what it's worth, this is how I format the output in xlwt. The documentation is (or at least was) pretty spotty so I had to guess most of this!

import xlwt

style = xlwt.XFStyle()
style.font.name = 'Courier'
style.font.height = 180
style.num_format_str = '#,##0'

# ws0 is a worksheet
ws0.write( row, col, value, style )

Also, I believe I duplicated your error message when attempting to format the resulting spreadsheet in excel (office 2010 version). It's weird, but some of the drop down tool bar format options work and some don't. But it looks like they all work fine if I go to "format cells" via a right click.

like image 40
JohnE Avatar answered Sep 22 '22 04:09

JohnE