Apache POI Java Excel Performance for Large Spreadsheets

Question

I have a spreadsheet I'm trying to read with POI (I have both xls and xlsx formats), but in this case, the problem is with the xls file. My spreadsheet has about 10,000 rows and 75 columns, and reading it in can take several minutes (though Excel opens in a few seconds). I'm using the event based reading, rather than reading the whole file into memory. The meat of my code is below. It's a bit messy right now, but it's really just a long switch statement that was mostly copied from the POI examples.

Is it typical for POI performance using the event model to be so slow? Is there anything I an do to speed this up? I think several minutes will be unacceptable for my application.

    POIFSFileSystem poifs = new POIFSFileSystem(fis);
    InputStream din = poifs.createDocumentInputStream("Workbook");
    try
    {
        HSSFRequest req = new HSSFRequest();
        listener = new FormatTrackingHSSFListener(new HSSFListener() {
            @Override
            public void processRecord(Record rec)
            {
                thisString = null;
                int sid = rec.getSid();
                switch (sid)
                {
                    case SSTRecord.sid:
                        strTable = (SSTRecord) rec;
                        break;
                    case LabelSSTRecord.sid:
                        LabelSSTRecord labelSstRec = (LabelSSTRecord) rec;
                        thisString = strTable.getString(labelSstRec
                                .getSSTIndex()).getString();
                        row = labelSstRec.getRow();
                        col = labelSstRec.getColumn();
                        break;
                    case RKRecord.sid:
                        RKRecord rrk = (RKRecord) rec;
                        thisString = "";
                        row = rrk.getRow();
                        col = rrk.getColumn();
                        break;
                    case LabelRecord.sid:
                        LabelRecord lrec = (LabelRecord) rec;
                        thisString = lrec.getValue();
                        row = lrec.getRow();
                        col = lrec.getColumn();
                        break;
                    case BlankRecord.sid:
                        BlankRecord blrec = (BlankRecord) rec;
                        thisString = "";
                        row = blrec.getRow();
                        col = blrec.getColumn();
                        break;
                    case BoolErrRecord.sid:
                        BoolErrRecord berec = (BoolErrRecord) rec;
                        row = berec.getRow();
                        col = berec.getColumn();
                        byte errVal = berec.getErrorValue();
                        thisString = errVal == 0 ? Boolean.toString(berec
                                .getBooleanValue()) : ErrorConstants
                                .getText(errVal);
                        break;
                    case FormulaRecord.sid:
                        FormulaRecord frec = (FormulaRecord) rec;
                        switch (frec.getCachedResultType())
                        {
                            case Cell.CELL_TYPE_NUMERIC:
                                double num = frec.getValue();
                                if (Double.isNaN(num))
                                {
                                    // Formula result is a string
                                    // This is stored in the next record
                                    outputNextStringRecord = true;
                                }
                                else
                                {
                                    thisString = formatNumericValue(frec, num);
                                }
                                break;
                            case Cell.CELL_TYPE_BOOLEAN:
                                thisString = Boolean.toString(frec
                                        .getCachedBooleanValue());
                                break;
                            case Cell.CELL_TYPE_ERROR:
                                thisString = HSSFErrorConstants
                                        .getText(frec.getCachedErrorValue());
                                break;
                            case Cell.CELL_TYPE_STRING:
                                outputNextStringRecord = true;
                                break;
                        }
                        row = frec.getRow();
                        col = frec.getColumn();
                        break;
                    case StringRecord.sid:
                        if (outputNextStringRecord)
                        {
                            // String for formula
                            StringRecord srec = (StringRecord) rec;
                            thisString = srec.getString();
                            outputNextStringRecord = false;
                        }
                        break;
                    case NumberRecord.sid:
                        NumberRecord numRec = (NumberRecord) rec;
                        row = numRec.getRow();
                        col = numRec.getColumn();
                        thisString = formatNumericValue(numRec, numRec
                                .getValue());
                        break;
                    case NoteRecord.sid:
                        NoteRecord noteRec = (NoteRecord) rec;
                        row = noteRec.getRow();
                        col = noteRec.getColumn();
                        thisString = "";
                        break;
                    case EOFRecord.sid:
                        inSheet = false;
                }
                if (thisString != null)
                {
                    // do something with the cell value 
                }
            }
        });
        req.addListenerForAllRecords(listener);
        HSSFEventFactory factory = new HSSFEventFactory();
        factory.processEvents(req, din);

sarath · Accepted Answer

f you are using Apache POI to generate large excel file, please take note the following line :

sheet.autoSizeColumn((short) p);

Because this will degrade the performance.

ludwigm · Answer

I did also some processing with thousands of large excel files and in my opinion POI is very fast. Loading that excel files tooks also about 1 minute in Excel itself. So i would confirm that the problem lies out of POI code

Apache POI Java Excel Performance for Large Spreadsheets

Tags:

java

performance

excel

apache-poi

Jeff Storey

2 Answers

sarath

ludwigm

Recent Activity

Donate For Us

Apache POI Java Excel Performance for Large Spreadsheets

Tags:

java

performance

excel

apache-poi

Jeff Storey

2 Answers

sarath

ludwigm

Related questions

Recent Activity

Donate For Us