I'm trying to find a way to search inside PDF files. I came accross the PHP PDF class but I can't seem to find any function for reading/searching a filestream.
So, as naive as I am, i tried to simple get a stream using file_get_contents(), obviously it's an encrypted-like output ;)
So my question, is there any way to search through PDF files? I'm looking for script-only / free / open source solutions and not buying some expensive commercial libraray.
When a PDF is opened in the Acrobat Reader (not in a browser), the search window pane may or may not be displayed. To display the search/find window pane, use "Ctrl+F".
Note: PHP is not actually reading the PDF file. It does not recognize File as pdf. It only passes the PDF file to the browser to be read there.
MPDF is an HTML-to-PDF generator based on FPDF, one of the original PHP PDF conversion libraries. It has excellent documentation. Unfortunately, it also lacks support for JavaScript and is slow, especially with large tables. MPDF has added support for custom HTML tags to improve page break and header handling.
XPDF?
There is a blog post here that may be of help.
There seems to be some code here that could help - a simple class that reads a PDF into plaintext. Unsure if it supports decryption.
There are also a number of resources in PHP documentation that may help you. Click.
FPDF and FPDI may also help. Probably your best bet after some research.**
A PHP search engine called Sphider has the option of adding PDF search via XPDF. You can then customise the result templates to fit in with the rest of your site (if applicable).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With