Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

LoadIFilter() fails on all PDFs (but MS's filtdump.exe doesn't.)

I'm trying to write a C# utility that mimics the behavior of filtdump.exe from the Windows Search SDK (since filtdump doesn't appear to be redistributable itself.) I'm running into a combination of contradictory and/or non-existent documentation and technical problems I can't seem to track down. I'm hoping someone can help eliminate one or the other of those hurdles...

According to MSDN, filtdump uses ILoadFilter::LoadIFilter to load it's IFilter. I contend that MSDN is lying, since it also claims ILoadFilter::LoadIFilter only exists on Windows 7, but filtdump works fine on earler OS's. Process Monitor indicates that it's actually calling LoadIFilter() from query.dll, so that's what I'm doing:

public static class NativeMethods
{
    // From Windows SDK v7.1, NTQuery.h
    [DllImport("query.dll", CharSet = CharSet.Unicode)]
    public static extern int LoadIFilter(
        string pwcsPath,
        [MarshalAs(UnmanagedType.IUnknown)] 
        ref object pUnkOuter,
        ref IFilter ppIUnk);
}

object iUnknown = null;
IFilter filter = null;
var result = NativeMethods.LoadIFilter(args[0], ref iUnknown, ref filter);
if (result != ResultCodes.S_OK)
{
  Console.WriteLine("Failed to load an IFilter for {0}: {1}", args[0], result);
  return;
}

For the most part, this application and filtdump give me the same results -- they can both open and extract text from text, Word document, and Outlook emails, and both fail on the same set of other documents that have no IFilter. However, PDFs are giving me a problem. Filtdump manages to open and extract the text from most of the PDFs I've thrown at it, but every single one of the PDFs I try with my own application gives me an HRESULT of 0x80004005, E_FAIL.

This is the same error from this question but I'm getting it on every PDF, and filtdump is not, so I know that the IFilter is working on at least some documents. Has anyone done this kind of thing before with PDFs that can see what I'm doing wrong?

like image 461
Michael Edenfield Avatar asked Aug 24 '11 15:08

Michael Edenfield


2 Answers

You may want to see this blog post. In short, v10 of Adobe's PDF filter uses a whitelist of applications allowed to use the filter, including Microsoft's diagnostic tools like filtdump.exe, supposedly as a “security measure”.

like image 166
chase Avatar answered Oct 22 '22 02:10

chase


Load IFilter fails because Adove PDF Filter is marked as STA and our c sharp application are by default MTA so that is why it can not load PDF Filter. Try to make your application STA then load PDF Filter.

Ajax

like image 1
ajay Avatar answered Oct 22 '22 02:10

ajay