Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

extracting content from pdf using PHP

Tags:

php

pdf

Could you please tell me how to extract content from PDF document using PHP? Formatting is the main problem im facing here. So let me know, if there are some ways to extract content with the same format and to display it on an online text editor.

Thanks

like image 501
jose Avatar asked Nov 15 '22 14:11

jose


1 Answers

Have a look at XPDF

I suppose you could do

$text = shell_exec("pdftotext $pdffile");

As for displaying it in an editor? Well, which editor? To retain some type of formatting information, and assuming by web editor you mean HTML editor, you can convert it to HTML. Perhaps there are other tools available, but since i use xpdf i came across this converter that is based on xpdf.

Basic usage

pdftohtml -noframes -c test.pdf test.html

To get it into your favorite editor

echo file_get_contents('test.html');

You may need to wrap things inside PHP functions/classes. And you may want to add security measures and whatnot.

like image 151
Peter Lindqvist Avatar answered Dec 17 '22 05:12

Peter Lindqvist