I am using XSSF
of apache-POI
to read the XLSX file. I was getting an error java.lang.OutOfMemoryError: Java heap space
. Later, increased the heap size using -Xmx1024m
for the java class still the same error repeats.
Code:
String filename = "D:\\filename.xlsx";
FileInputStream fis = null;
try {
fis = new FileInputStream(filename);
XSSFWorkbook workbook = new XSSFWorkbook(fis);
In the above code segment, the execution stops at XSSFWorkbook
and throws the specified error.
Can someone suggest better approach to read large XLSX files.
To read an XLSX file in R, first copy the data from Excel, then import data from the clipboard into R. How do I read an XLSX file in Python? You can read the file with the Python module named openpyxl. Download the openpyxl module, then use the Python import command to read the data from the XLSX file.
XLSX files are stored in Open XML file format, and while you'll only be able to read XLSX files on versions of Excel released in 2007 or later, XLSX files are, in fact, a compressed version of XML files. So XLSX files are usually going to be smaller than XLS files.
POI allows you to read excel files in a streaming manner. The API is pretty much a wrapper around SAX. Make sure you open the OPC package in the correct way, using the constructor that takes a String. Otherwise you could run out of memory immediately.
OPCPackage pkg = OPCPackage.open(file.getPath());
XSSFReader reader = new XSSFReader(pkg);
Now, reader will allow you to get InputStreams
for the different parts. If you want to do the XML parsing yourself (using SAX or StAX), you can use these. But it requires being very familiar with the format.
An easier option is to use XSSFSheetXMLHandler. Here is an example that reads the first sheet:
StylesTable styles = reader.getStylesTable();
ReadOnlySharedStringsTable sharedStrings = new ReadOnlySharedStringsTable(pkg);
ContentHandler handler = new XSSFSheetXMLHandler(styles, sharedStrings, mySheetContentsHandler, true);
XMLReader parser = XMLReaderFactory.createXMLReader();
parser.setContentHandler(handler);
parser.parse(new InputSource(reader.getSheetsData().next()));
Where mySheetsContentHandler should be your own implementation of XSSFSheetXMLHandler.SheetContentsHandler. This class will be fed rows and cells.
Note however that this can be moderately memory consuming if your shared strings table is huge (which happens if you don't have any duplicate strings in your huge sheets). If memory is still a problem, I recommend using the raw XML streams (also provided by XSSFReader).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With