Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I convert a PDF to text so I can parse that text with PHP?

I have PDFs that are mostly simply formatted text. I would like to parse the text with PHP. I realize that the PDF is binary so I need a utility or library to convert it to text.

Any recommendations?

like image 984
T. Brian Jones Avatar asked Jun 23 '11 09:06

T. Brian Jones


People also ask

Can you parse data from a PDF?

A PDF Parser (also sometimes called PDF scraper) is a software that can be used to extract data from PDF documents. PDF Parsers can come in form of libraries for developers or as standalone software products for end-users. PDF Parsers are used mainly to extract data from a batch of PDF files.

Can PHP read a PDF file?

PHP passes the PDF files to read it on the browser.


1 Answers

Third party software can dump the text contents of a PDF file, for example:

  • xdoc2txt (Windows-only, used in WinMerge plugins)
  • pdftotext, part of Xpdf
like image 111
Benoit Avatar answered Sep 25 '22 00:09

Benoit