Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Apache POI Java Excel Performance for Large Spreadsheets

I have a spreadsheet I'm trying to read with POI (I have both xls and xlsx formats), but in this case, the problem is with the xls file. My spreadsheet has about 10,000 rows and 75 columns, and reading it in can take several minutes (though Excel opens in a few seconds). I'm using the event based reading, rather than reading the whole file into memory. The meat of my code is below. It's a bit messy right now, but it's really just a long switch statement that was mostly copied from the POI examples.

Is it typical for POI performance using the event model to be so slow? Is there anything I an do to speed this up? I think several minutes will be unacceptable for my application.

    POIFSFileSystem poifs = new POIFSFileSystem(fis);
    InputStream din = poifs.createDocumentInputStream("Workbook");
    try
    {
        HSSFRequest req = new HSSFRequest();
        listener = new FormatTrackingHSSFListener(new HSSFListener() {
            @Override
            public void processRecord(Record rec)
            {
                thisString = null;
                int sid = rec.getSid();
                switch (sid)
                {
                    case SSTRecord.sid:
                        strTable = (SSTRecord) rec;
                        break;
                    case LabelSSTRecord.sid:
                        LabelSSTRecord labelSstRec = (LabelSSTRecord) rec;
                        thisString = strTable.getString(labelSstRec
                                .getSSTIndex()).getString();
                        row = labelSstRec.getRow();
                        col = labelSstRec.getColumn();
                        break;
                    case RKRecord.sid:
                        RKRecord rrk = (RKRecord) rec;
                        thisString = "";
                        row = rrk.getRow();
                        col = rrk.getColumn();
                        break;
                    case LabelRecord.sid:
                        LabelRecord lrec = (LabelRecord) rec;
                        thisString = lrec.getValue();
                        row = lrec.getRow();
                        col = lrec.getColumn();
                        break;
                    case BlankRecord.sid:
                        BlankRecord blrec = (BlankRecord) rec;
                        thisString = "";
                        row = blrec.getRow();
                        col = blrec.getColumn();
                        break;
                    case BoolErrRecord.sid:
                        BoolErrRecord berec = (BoolErrRecord) rec;
                        row = berec.getRow();
                        col = berec.getColumn();
                        byte errVal = berec.getErrorValue();
                        thisString = errVal == 0 ? Boolean.toString(berec
                                .getBooleanValue()) : ErrorConstants
                                .getText(errVal);
                        break;
                    case FormulaRecord.sid:
                        FormulaRecord frec = (FormulaRecord) rec;
                        switch (frec.getCachedResultType())
                        {
                            case Cell.CELL_TYPE_NUMERIC:
                                double num = frec.getValue();
                                if (Double.isNaN(num))
                                {
                                    // Formula result is a string
                                    // This is stored in the next record
                                    outputNextStringRecord = true;
                                }
                                else
                                {
                                    thisString = formatNumericValue(frec, num);
                                }
                                break;
                            case Cell.CELL_TYPE_BOOLEAN:
                                thisString = Boolean.toString(frec
                                        .getCachedBooleanValue());
                                break;
                            case Cell.CELL_TYPE_ERROR:
                                thisString = HSSFErrorConstants
                                        .getText(frec.getCachedErrorValue());
                                break;
                            case Cell.CELL_TYPE_STRING:
                                outputNextStringRecord = true;
                                break;
                        }
                        row = frec.getRow();
                        col = frec.getColumn();
                        break;
                    case StringRecord.sid:
                        if (outputNextStringRecord)
                        {
                            // String for formula
                            StringRecord srec = (StringRecord) rec;
                            thisString = srec.getString();
                            outputNextStringRecord = false;
                        }
                        break;
                    case NumberRecord.sid:
                        NumberRecord numRec = (NumberRecord) rec;
                        row = numRec.getRow();
                        col = numRec.getColumn();
                        thisString = formatNumericValue(numRec, numRec
                                .getValue());
                        break;
                    case NoteRecord.sid:
                        NoteRecord noteRec = (NoteRecord) rec;
                        row = noteRec.getRow();
                        col = noteRec.getColumn();
                        thisString = "";
                        break;
                    case EOFRecord.sid:
                        inSheet = false;
                }
                if (thisString != null)
                {
                    // do something with the cell value 
                }
            }
        });
        req.addListenerForAllRecords(listener);
        HSSFEventFactory factory = new HSSFEventFactory();
        factory.processEvents(req, din);
like image 200
Jeff Storey Avatar asked May 13 '11 13:05

Jeff Storey


2 Answers

f you are using Apache POI to generate large excel file, please take note the following line :

sheet.autoSizeColumn((short) p);

Because this will degrade the performance.

like image 161
sarath Avatar answered Sep 20 '22 12:09

sarath


I did also some processing with thousands of large excel files and in my opinion POI is very fast. Loading that excel files tooks also about 1 minute in Excel itself. So i would confirm that the problem lies out of POI code

like image 29
ludwigm Avatar answered Sep 18 '22 12:09

ludwigm