Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a C++ library to extract text from a PDF file like PDFBox for Java? [closed]

Tags:

Last year, I made an application in Java using PDFBox to get the raw text in some PDF files and I need to port that application to C++ now.

I wanted to know what was the best C++ alternative to accomplish what I need.

I'll give an example in case it helps:

Most files will look like this: http://www.jumbala.net/backup/league.pdf

With PDFBox, using that file, each line read on page 2 and most of page 3 would output all the data of a line, separated by a space instead of keeping it in a grid like it is now.

So the first relevant line in page 2 would look like this:

FB 847 - Tremblay, Gérard 179,63 56 16167 90 268 s27 p3 669 s14 199 223 193 615 

or something like that since there are minor changes in the order they appear, but I don't care about that as long as similar lines output the same since I just parse them and put the values I need in different variables.

So, knowing all of that, is there a library that I can use in a C++ program to get similar results?

Edit: After looking at sacredFaith's link at http://www.codeproject.com/Articles/7056/Code-to-extract-plain-text-from-a-PDF-file and trying it, I'm getting a weird output like such for the example file I mentioned earlier:

http://www.jumbala.net/backup/league.pdf.txt

The parts I actually need are in the weird characters at the beginning. Using Adobe Acrobat Reader X and using Save As... Text (accessible), I get the following result:

http://www.jumbala.net/backup/league_good.pdf.txt

Which is approximately what I get in Java using PDFBox and what I want to get as output in C++.

like image 850
Adam Smith Avatar asked Mar 30 '12 23:03

Adam Smith


People also ask

How do I extract content from a PDF?

Once you've opened the file, click on the "Edit" tab, and then click on the "edit" icon. Now you can right-click on the text and select "Copy" to extract the text you need.

How do I read a PDF programmatically?

Opening a PDF file in Android using WebView All you need to do is just put WebView in your layout and load the desired URL by using the webView. loadUrl() function. Now, run the application on your mobile phone and the PDF will be displayed on the screen.


1 Answers

Xpdf is a C++ application/library which includes tools to extract plain text from a PDF file.

like image 167
Charles Salvia Avatar answered Nov 17 '22 07:11

Charles Salvia