I am setting up a java project where I use pdfBox to get images out of PDF. Since I am using tika-app for my other functions, I decided to go with pdfBox present inside tika-app-1.20.jar.
I have tried including the jai-imageio-core-1.3.1.jar before,since Tika-app already comes bundled with this jar. I tried with tika-app jar alone.
The line that's throwing error
PDXObject object = resources.getXObject(cosName);
the log trace of the error:
org.apache.pdfbox.filter.MissingImageReaderException: Cannot read JPEG2000 image: Java Advanced Imaging (JAI) Image I/O Tools are not installed
at org.apache.pdfbox.filter.Filter.findImageReader(Filter.java:163)
at org.apache.pdfbox.filter.JPXFilter.readJPX(JPXFilter.java:115)
at org.apache.pdfbox.filter.JPXFilter.decode(JPXFilter.java:64)
at org.apache.pdfbox.cos.COSInputStream.create(COSInputStream.java:77)
at org.apache.pdfbox.cos.COSStream.createInputStream(COSStream.java:175)
at org.apache.pdfbox.cos.COSStream.createInputStream(COSStream.java:163)
at org.apache.pdfbox.pdmodel.common.PDStream.createInputStream(PDStream.java:236)
at org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.<init>(PDImageXObject.java:140)
at org.apache.pdfbox.pdmodel.graphics.PDXObject.createXObject(PDXObject.java:70)
at org.apache.pdfbox.pdmodel.PDResources.getXObject(PDResources.java:426)
But I am quite sure I have jai-imageio-core in tika which turns out to be invisible when I run the code.
ERROR o.a.p.contentstream.PDFStreamEngine eight hundred and ninety – Cannot read JPEG2000 image: Java Advanced Imaging (JAI) Image I/O Tools are not installed This means that there is a lack of I/O tools to read JPEG2000 images. The problem should be here. 2.
IMPORTANT: If you do not "cd" to the JRE (or higher version) directory before performing the extraction, then JAI Image I/O Tools will not install. You may also need appropriate write permissions into the JRE directory in order to install.
Java Advanced Imaging Image I/O Tools will be installed under the $JDK directory with the files in the locations given in the following table. The same files will appear in analogous locations in the JRE installation. In the CLASSPATH installation they will be located in the installation directory and its lib subdirectory.
You may delete the jai_imageio-1_0_01-*.exe files after you are done with the extraction. IMPORTANT: The executable will try to install JAI Image I/O Tools within the latest JDK (or JRE) version that it finds.
Actually, I stumbled upon this error as well but this is mentionned in the PDFBox documentation here. You need to add the following dependencies to your pom.xml
:
<dependency>
<groupId>com.github.jai-imageio</groupId>
<artifactId>jai-imageio-core</artifactId>
<version>1.4.0</version>
</dependency>
<dependency>
<groupId>com.github.jai-imageio</groupId>
<artifactId>jai-imageio-jpeg2000</artifactId>
<version>1.3.0</version>
</dependency>
<!-- Optional for you ; just to avoid the same error with JBIG2 images -->
<dependency>
<groupId>org.apache.pdfbox</groupId>
<artifactId>jbig2-imageio</artifactId>
<version>3.0.3</version>
</dependency>
If you are using Gradle :
dependencies {
implementation 'com.github.jai-imageio:jai-imageio-core:1.4.0'
implementation 'com.github.jai-imageio:jai-imageio-jpeg2000:1.3.0'
// Optional for you ; just to avoid the same error with JBIG2 images
implementation 'org.apache.pdfbox:jbig2-imageio:3.0.3'
}
It happens that it requires an additional jar known as jai-imageio-jpeg2000 to support jp2k images.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With