Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to read PDF files using Java? [closed]

Tags:

java

pdf

I want to read some text data from a PDF file using Java. How can I do that?

like image 496
yohan.jayarathna Avatar asked Jan 24 '11 17:01

yohan.jayarathna


People also ask

Can I read a PDF in Java?

It is not difficult to read PDF files in Java using libraries that are readily available. Reading PDF files allows you to write Java programs that can process the text in those files. One option for reading PDF files is the free, open-source PDFBox library available from Apache.


1 Answers

PDFBox is the best library I've found for this purpose, it's comprehensive and really quite easy to use if you're just doing basic text extraction. Examples can be found here.

It explains it on the page, but one thing to watch out for is that the start and end indexes when using setStartPage() and setEndPage() are both inclusive. I skipped over that explanation first time round and then it took me a while to realise why I was getting more than one page back with each call!

Itext is another alternative that also works with C#, though I've personally never used it. It's more low level than PDFBox, so less suited to the job if all you need is basic text extraction.

like image 136
Michael Berry Avatar answered Sep 19 '22 21:09

Michael Berry