Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to extract text from the PDF document? [closed]

How to extract text from the PDF document using PHP?

(I can't use other tools, I don't have root access)

I've found some functions working for plain text, but they don't handle well Unicode characters:

http://www.hashbangcode.com/blog/zend-lucene-and-pdf-documents-part-2-pdf-data-extraction-437.html

like image 677
Sfisioza Avatar asked Aug 09 '11 16:08

Sfisioza


People also ask

How do I Copy from a closed PDF?

Select your desired text from PDF and right-click to choose the "Copy" option or press the "Ctrl +C" keys to copy the texts. You are also able to edit PDF text if you need it.


1 Answers

Download the class.pdf2text.php @ https://pastebin.com/dvwySU1a or http://www.phpclasses.org/browse/file/31030.html (Registration required)

Code:

include('class.pdf2text.php'); $a = new PDF2Text(); $a->setFilename('filename.pdf');  $a->decodePDF(); echo $a->output();  

  • class.pdf2text.php Project Home

  • pdf2textclass doesn't work with all the PDF's I've tested, If it doesn't work for you, try PDF Parser


like image 196
Pedro Lobito Avatar answered Sep 18 '22 21:09

Pedro Lobito