Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use a specific PDF IFilter

I'm trying to extract text from PDF files using an iFilter.

The Adobe PDF iFilter that is distributed with Adobe Reader is awful, returning HRESULT E_FAIL messages for many PDF documents.

The FoxIt PDF IFilter works beautifully on virtually all of the PDFs I've been using for testing.

The problem is that every time the Adobe Updater runs, it replaces the awesome FoxIt IFilter with the crappy Adobe IFilter.

I've been using the LoadIFilter method to get the registered IFilter for PDF files. Is there a way to force the Win32 API to load the FoxIt IFilter instead of the Adobe IFilter?

NOTE: This question about determining which IFilters are installed asks a related -- but not identical -- question.

like image 854
dthrasher Avatar asked Mar 08 '10 19:03

dthrasher


1 Answers

The IFilter seems to be registered as a COM Object with windows, so you should be able to just create an instance of it using COM.

From http://msdn.microsoft.com/en-us/library/ms692565 : The structure of the DLL is that it has a IFilter and a IClassFactory

You should be able to instantiate the IClassFactory (given the CLSID)

check out http://msdn.microsoft.com/en-us/library/ms684007 http://msdn.microsoft.com/en-us/library/ms680760

like image 167
Nigel Thorne Avatar answered Sep 21 '22 23:09

Nigel Thorne