The following lines take 45 seconds the first, and over a minute and a half the second. Something.xls is 4 MB big and the changes are minor. Is there something wrong?
something = openpyxl.load_workbook('Something.xlsx')
something.save('Something.xlsx')
Some details: I'm using Python 2.7.3 on Windwos 7, the workbook has 2 sheets the first of which has 67610 rows, I'm not accesing any network to do this job.
Step 3: Load with Openpyxl The file is loaded to memory but data is loaded through a generator which allows mapped-retrieval of values. Still slow but a tiny drop faster than Pandas.
I would like to add here that openpyxl doesn't have a comfortable support for calculating formulas (see here), if you need to read or calculate formula results, xlwings is easier to use in this regard. However, regarding speed, it seems that openpyxl is faster.
The openpyxl. load_workbook() function takes in the filename and returns a value of the workbook data type. This Workbook object represents the Excel file, a bit like how a File object represents an opened text file.
python - Reading Excel file is magnitudes slower using openpyxl compared to xlrd - Stack Overflow. Stack Overflow for Teams – Start collaborating and sharing organizational knowledge.
So I created a 67k row excel sheet with only 4 columns and random decimal data - and the sheet was almost 5MB, so >1000x what you said in your question. Given that this is a decent amount of data I would suggest using the optimized reader, not the normal one. Here's a link to the tutorial:
https://openpyxl.readthedocs.org/en/latest/optimized.html
Hopefully this helps!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With