I am a Symfony developer and my web server is Linux. I already use the sfLucene plugin.
What is the simplest way of indexing PDF files for search on a Linux PHP server?
Thanks!
Coming from a Zend background, i generally recommend using Zend_Search_Lucene. The XPDF example is really straight forward and looks simple. XPDF is licenced as GPL - if that fits your need, go for #1!
ZF can easily be integrated within your Symfony projects, e.g. for a Twitter Call.
There are many libraries for extracting text content from PDF. With any of these, you then need to create a lucene document with the content. The most useful ones will be those that already have lucene integration.
Apache PDFBox can create a lucene document directly from PDF file. It will include PDF metadata fields as well as text content.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With