Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Error reading Excel .XLSX with Apache POI

I am using Apache POI 3.8 libraries to read an XLSX file in a web application. The following code works perfectly fine from a Java console app:

InputStream inputFS = new FileInputStream("test.xlsx");
Workbook workbook = new XSSFWorkbook(inputFS); // below exception is thrown on this line
Sheet sheet = workbook.getSheetAt(0);

but throws a "read error" when used in the web application. A relevant extract of the stack trace is pasted below:

java.io.IOException: Read error
at java.io.FileInputStream.readBytes(Native Method) ~[na:1.6.0_31]
at java.io.FileInputStream.read(Unknown Source) ~[na:1.6.0_31]
at java.io.FilterInputStream.read(Unknown Source) ~[na:1.6.0_31]
at java.io.PushbackInputStream.read(Unknown Source) ~[na:1.6.0_31]
at java.util.zip.ZipInputStream.readFully(Unknown Source) ~[na:1.6.0_31]
at java.util.zip.ZipInputStream.readLOC(Unknown Source) ~[na:1.6.0_31]
at java.util.zip.ZipInputStream.getNextEntry(Unknown Source) ~[na:1.6.0_31]
at org.apache.poi.openxml4j.util.ZipInputStreamZipEntrySource.<init>(ZipInputStreamZipEntrySource.java:51) ~[poi-ooxml-3.8-20120326.jar:3.8]
at org.apache.poi.openxml4j.opc.ZipPackage.<init>(ZipPackage.java:83) ~[poi-ooxml-3.8-20120326.jar:3.8]
at org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:228) ~[poi-ooxml-3.8-20120326.jar:3.8]
at org.apache.poi.util.PackageHelper.open(PackageHelper.java:39) ~[poi-ooxml-3.8-20120326.jar:3.8]
at org.apache.poi.xssf.usermodel.XSSFWorkbook.<init>(XSSFWorkbook.java:187) ~[poi-ooxml-3.8-20120326.jar:3.8]
at com.corp.ReportManager.parseExcelReport(ReportManager.java:575) [ReportManager.class:na]

The following JARs are included in the classpath (in the same order):

poi-3.8-20120326.jar
poi-ooxml-3.8-20120326.jar
poi-ooxml-schemas-3.8-20120326.jar
xbean.jar
dom4j-1.6.1.jar

There does not seem to be an memory related issues since I gathered some heap utilization stats just before invoking the above code. The XLSX file is 1.15 MB in size.

##### Heap utilization statistics [MB] #####
Used Memory:13 MB
Free Memory:9 MB
Total Memory:23 MB
Max Memory:247 MB
like image 648
Web User Avatar asked Jun 21 '12 19:06

Web User


People also ask

Does Apache POI support XLSX?

Apache POI is able to handle both XLS and XLSX formats of spreadsheets. Some important points about Apache POI API are: Apache POI contains HSSF implementation for Excel '97(-2007) file format i.e XLS. Apache POI XSSF implementation should be used for Excel 2007 OOXML (.

Does Apache POI help to read Excel file?

To Read and Write Excel file in Java, Apache provides a very famous library POI. This library is capable enough to read and write both XLS and XLSX file format of Excel. To read XLS files, an HSSF implementation is provided by POI library.


2 Answers

The method using the above code has a single parameter - FileInputStream. The first line in the code snippet is very much part of the code, but part of the invoking method. Since the method in question did not have knowledge of the Excel format or even a file extension to make an educated guess, I decided that I would first try to read the FileInputStream using HSSF API as below:

Sheet sheet = null;
try {

    POIFSFileSystem poifs = new POIFSFileSystem(inputFS);
    Workbook workbook = new HSSFWorkbook(poifs);
    sheet = workbook.getSheetAt(0);
}
catch (Exception e) {
}

if (sheet == null) {

    try {

        Workbook workbook = new XSSFWorkbook(inputFS);
        sheet = workbook.getSheetAt(0);
    }
    catch (Exception e) {
    }
}

The problem with the above code is that the state of the inputFS object during the second attempt of opening it via the XSSF API is unknown. And this yielded a read error. I replaced the above with the following code, which works fine and the issue appears to be resolved:

Sheet sheet = null;
try {

    Workbook workbook = WorkbookFactory.create(inputFS);
    sheet = workbook.getSheetAt(0);
}
catch (Exception e) {
}

I tested this with both XLS (older, binary) and XLSX (newer, XML-based) formats and it works. Thanks for everyone's help and input!

like image 147
Web User Avatar answered Sep 23 '22 02:09

Web User


The exception indicates that there's something up with your InputStream. However, if you have a file, then pass that in to POI directly!. Using an InputStream requires buffering of everything into memory, which eats up space. Since you don't need to do that buffering, don't! Avoiding that buffering ought to fix your problem anyway

If you're running with the latest nightly builds of POI, then it's very easy. Your code becomes:

File file = new File("test.xlsx");
OPCPackage opcPackage = OPCPackage.open(file);
XSSFWorkbook workbook = new XSSFWorkbook(opcPackage);

Otherwise, it's very similar:

File file = new File("test.xlsx");
OPCPackage opcPackage = OPCPackage.open(file.getAbsolutePath());
XSSFWorkbook workbook = new XSSFWorkbook(opcPackage);

If you're not sure if your file is a HSSFWorkbook or XSSFWorkbook, then you can use the WorkbookFactory to open the appropriate one for you:

File file = new File("test.xlsx");
Workbook workbook = WorkbookFactory.create(file);
like image 24
Gagravarr Avatar answered Sep 23 '22 02:09

Gagravarr