Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I validate that an NSData is a PDF?

In working on a feed-reading iPhone app which displays nsdata's (html and pdf) in a UIWebView. I am hitting a snag in some PDF validation logic. I have an NSData object which I know contains a file with .pdf extension. I would like to restrict invalid PDFs from getting any further. Here's my first attempt at validation code, which seems to work for a majority of cases:

// pdfData is an NSData *
NSData *validPDF = [[NSString stringWithString:@"%PDF"] dataUsingEncoding: NSASCIIStringEncoding];
if (!(pdfData && [[pdfData subdataWithRange:NSMakeRange(0, 4)] isEqualToData:validPDF])) {
    // error
}

Unfortunately, a new pdf was uploaded a few days ago. It is valid in the sense that the UIWebView will display it fine, yet it fails my validation test. I have tracked down the issue to the fact that it was a bunch of garbage bytes at the beginning, with the %PDF coming midway through the 14th set of hex characters (the 25 or % is exactly the 54th byte):

%PDF: 25504446
Breaking PDF: 00010000 00ffffff ff010000 00000000 000f0100 0000b5e0 04000200 01000000 ffffffff 01000000 00000000 0f010000 0099e004 00022550 44462d31 etc...

What is the best practice for validating NSData to be a PDF?
What might be wrong with this particular PDF (it claims it was encoded by PaperPort 11.0, whatever that is)?

Thanks,

Mike

like image 581
TahoeWolverine Avatar asked Sep 29 '10 16:09

TahoeWolverine


1 Answers

This question seems quite helpful :

Detect if PDF file is correct (header PDF)

or, if you're feeling adventurous, here's the spec (from the Adobe site here)

like image 94
deanWombourne Avatar answered Oct 04 '22 18:10

deanWombourne