I am using Apache Commons to upload a .docx file in google app engine as explained in this link File upload servlet. While uploading, I also want to extract text by using Apache POI libraries.
If I pass this to the POI API:
InputStream stream = item.openStream();
I get the below exception:
java.lang.IllegalArgumentException: Your InputStream was neither an OLE2 stream, nor an OOXML stream
public static String docx2text(InputStream is) throws Exception {
return ExtractorFactory.createExtractor(is).getText();
}
I am uploading a valid .docx document. The POI API works fine if I pass a FileInputStream object.
FileInputStream fs=new FileInputStream(new File("C:\\docs\\mydoc.docx"));
An OLE file can be seen as a mini file system or a Zip archive: It contains streams of data that look like files embedded within the OLE file. Each stream has a name. For example, the main stream of a MS Word document containing its text is named “WordDocument”. An OLE file can also contain storages.
XSSFWorkbook. It is a class that is used to represent both high and low level Excel file formats. It belongs to the org. apache. xssf.
I don't know POI internal implementation, but my guess would be that they need a seekable stream. The streams returned by servlets (and networking in general) aren't seekable.
Try reading the whole contents and then wrapping it in ByteArrayInputStream
:
byte[] bytes = getBytes(item.openStream());
InputStream stream = new ByteArrayInputStream(bytes);
public static byte[] getBytes(InputStream is) throws IOException {
ByteArrayOutputStream buffer = new ByteArrayOutputStream();
int len;
byte[] data = new byte[100000];
while ((len = is.read(data, 0, data.length)) != -1) {
buffer.write(data, 0, len);
}
buffer.flush();
return buffer.toByteArray();
}
The issue is solved ..
while (iterator.hasNext()) { //Apache commons file upload code
FileItemStream item = iterator.next();
InputStream stream = item.openStream();
ByteArrayInputStream bs=new ByteArrayInputStream(IOUtils.toByteArray(stream));
POITextExtractor extractor = ExtractorFactory.createExtractor(bs);
System.out.println(extractor.getText());
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With