Last year, I made an application in Java using PDFBox to get the raw text in some PDF files and I need to port that application to C++ now.
I wanted to know what was the best C++ alternative to accomplish what I need.
I'll give an example in case it helps:
Most files will look like this: http://www.jumbala.net/backup/league.pdf
With PDFBox, using that file, each line read on page 2 and most of page 3 would output all the data of a line, separated by a space instead of keeping it in a grid like it is now.
So the first relevant line in page 2 would look like this:
FB 847 - Tremblay, Gérard 179,63 56 16167 90 268 s27 p3 669 s14 199 223 193 615
or something like that since there are minor changes in the order they appear, but I don't care about that as long as similar lines output the same since I just parse them and put the values I need in different variables.
So, knowing all of that, is there a library that I can use in a C++ program to get similar results?
Edit: After looking at sacredFaith's link at http://www.codeproject.com/Articles/7056/Code-to-extract-plain-text-from-a-PDF-file and trying it, I'm getting a weird output like such for the example file I mentioned earlier:
http://www.jumbala.net/backup/league.pdf.txt
The parts I actually need are in the weird characters at the beginning. Using Adobe Acrobat Reader X and using Save As... Text (accessible), I get the following result:
http://www.jumbala.net/backup/league_good.pdf.txt
Which is approximately what I get in Java using PDFBox and what I want to get as output in C++.
Once you've opened the file, click on the "Edit" tab, and then click on the "edit" icon. Now you can right-click on the text and select "Copy" to extract the text you need.
Opening a PDF file in Android using WebView All you need to do is just put WebView in your layout and load the desired URL by using the webView. loadUrl() function. Now, run the application on your mobile phone and the PDF will be displayed on the screen.
Xpdf is a C++ application/library which includes tools to extract plain text from a PDF file.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With