Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Search through PDF files with PHP

Tags:

php

search

pdf

I'm trying to find a way to search inside PDF files. I came accross the PHP PDF class but I can't seem to find any function for reading/searching a filestream.

So, as naive as I am, i tried to simple get a stream using file_get_contents(), obviously it's an encrypted-like output ;)

So my question, is there any way to search through PDF files? I'm looking for script-only / free / open source solutions and not buying some expensive commercial libraray.

like image 255
Ben Fransen Avatar asked Dec 10 '09 16:12

Ben Fransen


People also ask

How do I search for data in a PDF?

When a PDF is opened in the Acrobat Reader (not in a browser), the search window pane may or may not be displayed. To display the search/find window pane, use "Ctrl+F".

Can PHP read PDF file?

Note: PHP is not actually reading the PDF file. It does not recognize File as pdf. It only passes the PDF file to the browser to be read there.

Which is the best PDF library for PHP?

MPDF is an HTML-to-PDF generator based on FPDF, one of the original PHP PDF conversion libraries. It has excellent documentation. Unfortunately, it also lacks support for JavaScript and is slow, especially with large tables. MPDF has added support for custom HTML tags to improve page break and header handling.


2 Answers

XPDF?

There is a blog post here that may be of help.

There seems to be some code here that could help - a simple class that reads a PDF into plaintext. Unsure if it supports decryption.

There are also a number of resources in PHP documentation that may help you. Click.

FPDF and FPDI may also help. Probably your best bet after some research.**

like image 167
Daniel May Avatar answered Sep 25 '22 00:09

Daniel May


A PHP search engine called Sphider has the option of adding PDF search via XPDF. You can then customise the result templates to fit in with the rest of your site (if applicable).

like image 22
akamike Avatar answered Sep 25 '22 00:09

akamike