Is there a memory-efficient Java library to read large Microsoft Excel files (both .xls and .xlsx)? I have very limited experience with Apache POI, and it seemed to be a huge memory hog from what I recall (though perhaps this was just for writing and not for reading). Is there something better? Or am I misremembering and/or misusing POI?
It would be important for it to have a "friendly" open-source license as well.
In Java, reading excel files is not similar to reading word files because of cells in excel files. JDK does not provide a direct API to read or write Microsoft Excel or Word documents. We have to rely on the third-party library that is Apache POI.
You need poi-3.12. jar to read XLS file and poi-ooxml-3.12. jar to read XLSX file in Java.
Apache's POI library has an event-based API that has a smaller memory-footprint. Unfortunately, it only works with HSSF (Horrible Spreadsheet Format) and not XSSF (XML Spreadsheet Format - for OOXML files).
The Excel file formats are (both) huge and extremely complicated, and anything that reads all of their possible contents is going to be equally huge and complicated. Remember they can contain ranges, macros, links, embedded stuff etc.
However if you are reading something simple like a grid of numbers, I recommend first converting the spreadsheet to something simpler like CSV and then reading that format.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With