Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Error While Reading Large Excel Files (xlsx) Via Apache POI

I am trying to read large excel files xlsx via Apache POI, say 40-50 MB. I am getting out of memory exception. The current heap memory is 3GB.

I can read smaller excel files without any issues. I need a way to read large excel files and then them back as response via Spring excel view.

public class FetchExcel extends AbstractView {


    @Override
    protected void renderMergedOutputModel(
            Map model, HttpServletRequest request, HttpServletResponse response) 
    throws Exception {

    String fileName = "SomeExcel.xlsx";

    response.setContentType("application/vnd.openxmlformats-officedocument.spreadsheetml.sheet");

    OPCPackage pkg = OPCPackage.open("/someDir/SomeExcel.xlsx");

    XSSFWorkbook workbook = new XSSFWorkbook(pkg);

    ServletOutputStream respOut = response.getOutputStream();

    pkg.close();
    workbook.write(respOut);
    respOut.flush();

    workbook = null;                    

    response.setHeader("Content-disposition", "attachment;filename=\"" +fileName+ "\"");


    }    

}

I first started off using XSSFWorkbook workbook = new XSSFWorkbook(FileInputStream in); but that was costly per Apache POI API, so I switched to OPC package way but still the same effect. I don't need to parse or process the file, just read it and return it.

like image 996
jamesT Avatar asked Oct 22 '12 22:10

jamesT


People also ask

Does Apache POI support XLSX?

Apache POI provides excellent support for working with Microsoft Excel documents. Apache POI is able to handle both XLS and XLSX formats of spreadsheets. Some important points about Apache POI API are: Apache POI contains HSSF implementation for Excel '97(-2007) file format i.e XLS.

How do you fix Excel Cannot open the file XLSX because the file format?

Step 1: Launch Microsoft Excel on your PC. Step 2: Head to the “files” section, and click “Export.” Step 3: In the section, select “change file type” and click on the file with the error. Step 4: Change the file extension and save the file.

Does Apache POI help to read Excel file?

Apache POI is an open-source java library designed for reading and writing Microsoft documents in order to create and manipulate various file formats based on Microsoft Office.


2 Answers

You don't mention whether you need to modify the spreadsheet or not.

This may be obvious, but if you don't need to modify the spreadsheet, then you don't need to parse it and write it back out, you can simply read bytes from the file, and write out bytes, as you would with, say an image, or any other binary format.

If you do need to modify the spreadsheet before sending it to the user, then to my knowledge, you may have to take a different approach.

Every library that I'm aware of for reading Excel files in Java reads the whole spreadsheet into memory, so you'd have to have 50MB of memory available for every spreadsheet that could possibly be concurrently processed. This involves, as others have pointed out, adjusting the heap available to the VM.

If you need to process a large number of spreadsheets concurrently, and can't allocate enough memory, consider using a format that can be streamed, instead of read all at once into memory. CSV format can be opened by Excel, and I've had good results in the past by setting the content-type to application/vnd.ms-excel, setting the attachment filename to something ending in ".xls", but actually returning CSV content. I haven't tried this in a couple of years, so YMMV.

like image 196
GreyBeardedGeek Avatar answered Oct 22 '22 18:10

GreyBeardedGeek


Here is an example to read a large xls file using sax parser.

public void parseExcel(File file) throws IOException {

        OPCPackage container;
        try {
            container = OPCPackage.open(file.getAbsolutePath());
            ReadOnlySharedStringsTable strings = new ReadOnlySharedStringsTable(container);
            XSSFReader xssfReader = new XSSFReader(container);
            StylesTable styles = xssfReader.getStylesTable();
            XSSFReader.SheetIterator iter = (XSSFReader.SheetIterator) xssfReader.getSheetsData();
            while (iter.hasNext()) {
                InputStream stream = iter.next();

                processSheet(styles, strings, stream);
                stream.close();
            }
        } catch (InvalidFormatException e) {
            e.printStackTrace();
        } catch (SAXException e) {
            e.printStackTrace();
        } catch (OpenXML4JException e) {
            e.printStackTrace();
        }

}

protected void processSheet(StylesTable styles, ReadOnlySharedStringsTable strings, InputStream sheetInputStream) throws IOException, SAXException {

        InputSource sheetSource = new InputSource(sheetInputStream);
        SAXParserFactory saxFactory = SAXParserFactory.newInstance();
        try {
            SAXParser saxParser = saxFactory.newSAXParser();
            XMLReader sheetParser = saxParser.getXMLReader();
            ContentHandler handler = new XSSFSheetXMLHandler(styles, strings, new SheetContentsHandler() {

            @Override
                public void startRow(int rowNum) {
                }
                @Override
                public void endRow() {
                }
                @Override
                public void cell(String cellReference, String formattedValue) {
                }
                @Override
                public void headerFooter(String text, boolean isHeader, String tagName) {

                }

            }, 
            false//means result instead of formula
            );
            sheetParser.setContentHandler(handler);
            sheetParser.parse(sheetSource);
        } catch (ParserConfigurationException e) {
            throw new RuntimeException("SAX parser appears to be broken - " + e.getMessage());
}
like image 22
O.C. Avatar answered Oct 22 '22 16:10

O.C.