I am trying to validate an Excel file using Java before dumping it to database.
Here is my code snippet which causes error.
try {
    fis = new FileInputStream(file);
    wb = new XSSFWorkbook(fis);
    XSSFSheet sh = wb.getSheet("Sheet1");
    for(int i = 0 ; i < 44 ; i++){
        XSSFCell a1 = sh.getRow(1).getCell(i);
        printXSSFCellType(a1);
    }
    
} catch (FileNotFoundException e) {
    // TODO Auto-generated catch block
    e.printStackTrace();
}
catch (IOException e) {
    // TODO Auto-generated catch block
    e.printStackTrace();
}
Here is the error I get
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
    at java.util.ArrayList.<init>(Unknown Source)
    at java.util.ArrayList.<init>(Unknown Source)
    at org.apache.xmlbeans.impl.values.NamespaceContext$NamespaceContextStack.<init>(NamespaceContext.java:78)
    at org.apache.xmlbeans.impl.values.NamespaceContext$NamespaceContextStack.<init>(NamespaceContext.java:75)
    at org.apache.xmlbeans.impl.values.NamespaceContext.getNamespaceContextStack(NamespaceContext.java:98)
    at org.apache.xmlbeans.impl.values.NamespaceContext.push(NamespaceContext.java:106)
    at org.apache.xmlbeans.impl.values.XmlObjectBase.check_dated(XmlObjectBase.java:1273)
    at org.apache.xmlbeans.impl.values.XmlObjectBase.stringValue(XmlObjectBase.java:1484)
    at org.apache.xmlbeans.impl.values.XmlObjectBase.getStringValue(XmlObjectBase.java:1492)
    at org.openxmlformats.schemas.spreadsheetml.x2006.main.impl.CTCellImpl.getR(Unknown Source)
    at org.apache.poi.xssf.usermodel.XSSFCell.<init>(XSSFCell.java:105)
    at org.apache.poi.xssf.usermodel.XSSFRow.<init>(XSSFRow.java:70)
    at org.apache.poi.xssf.usermodel.XSSFSheet.initRows(XSSFSheet.java:179)
    at org.apache.poi.xssf.usermodel.XSSFSheet.read(XSSFSheet.java:143)
    at org.apache.poi.xssf.usermodel.XSSFSheet.onDocumentRead(XSSFSheet.java:130)
    at org.apache.poi.xssf.usermodel.XSSFWorkbook.onDocumentRead(XSSFWorkbook.java:286)
    at org.apache.poi.POIXMLDocument.load(POIXMLDocument.java:159)
    at org.apache.poi.xssf.usermodel.XSSFWorkbook.<init>(XSSFWorkbook.java:207)
    at com.xls.validate.ExcelValidator.main(ExcelValidator.java:79)
This works perfectly fine when the .xlsx file is less than 1 MB.
I understand this is because my .xlsx file is around 5-10 MB and POI tries to load the entire sheet at once in JVM memory.
What can be a possible workaround?
SXSSF is an API-compatible streaming extension of XSSF to be used when very large spreadsheets have to be produced, and heap space is limited. SXSSF achieves its low memory footprint by limiting access to the rows that are within a sliding window, while XSSF gives access to all rows in the document.
There are two options available to you. Option #1 - increase the size of your JVM Heap, so that Java has more memory available to it. Processing Excel files in POI using the UserModel code is DOM based, so the whole file (including parsed form) needs to be buffered into memory. Try a question like this one for advice on how to increase the help.
Option #2, which is more work - switch to event based (SAX) processing. This only processes part of the file at a time, so needs much much less memory. However, it requires more work from you, which is why you might be better throwing a few more GB of memory at the problem - memory is cheap while programmers aren't! The SpreadSheet howto page has instructions on how to do SAX parsing of .xlsx files, and there are various example files provided by POI you can look at for advice.
.
Also, another thing - you seem to be loading a File via a stream, which is bad as it means even more stuff needs buffering into memory. See the POI Documentation for more on this, including instructions on how to work with the File directly.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With