Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to obtain PDF table of contents (outline) data in iOS (iPad)?

I am building an iPad application that displays PDFs, and I'd like to be able to display the table of contents and the let user navigate to the relevant pages.

I have invested several hours in research at this point, and it appears that since PDFKit is [not supported in iOS], my only option is to parse the PDF meta data manually.

I have looked at several solutions, but all of them are silent on one point - how to associate a page in the "outline" metadata with the real page number of the item. I have examined my PDF document with [the Voyeur tool] and I can see the outline in the tree.

[This solution] helped me figure out how to navigate down the Outline/A/S/D tree to find the "Dest" object, but it performs some kind of object comparison using [self.pages indexOfObjectIdenticalTo:destPageDic] that I don't understand.

I have read the [official PDF spec from adobe], and section "12.3.2.3 Named Destinations" describes the way that an outline entry can point to a page:

Instead of being defined directly with the explicit syntax shown in Table 151, a destination may be referred to indirectly by means of a name object (PDF 1.1) or a byte string (PDF 1.2).

And continues with this line which is utterly incomprehensible to me:

The value of this entry shall be a dictionary in which each key is a destination name and the corresponding value is either an array defining the destination, using the syntax shown in Table 151, or a dictionary with a D entry whose value is such an array.

This refers to page 366, "12.3.2.2 Explicit Destinations" where a table describes a page: "In each case, page is an indirect reference to a page object"

So is the result of CGPDFDocumentGetPage or CGPDFPageGetDictionary an "indirect reference to a page object"?

I found a [thread on lists.apple.com] that discusses. [This comment] implies that you can compare the address (in memory?) of a CGPDFPageGetDictionary object for a given page and compare it to the pages in the "Outline" tree of the PDF meta data.

However, when I look at the address of page objects in the Outline tree and compare them to addresses they are never the same. The line used in that thread "TTDPRINT(@"%d => %p", k+1, dict);" is printing "dict" as a pointer in memory.. there's no reason to believe that an object returned there would be the same as one returned somewhere else.. they'd be in different places in memory!

My last hope was to look at the source code from apple's command line "outline" tool [mentioned in this book] (as [suggested by this thread]), but I can't find it anywhere.

Bottom line - does anyone have some insight into how PDF outlines work, or know of some open source code (preferably objective-c) that reads PDF outlines?

ARGG: I had all kinds of links posted here, but apparently a new user can only post one link at a time

like image 545
Bob Avatar asked Dec 12 '10 18:12

Bob


1 Answers

The result of CGPDFDocumentGetPage is the same as an indirect page reference you get when resolving a destination in an outline item. Both are essentially dictionaries and you can compare them using ==. When you have a CGPDFDictionaryRef that you want to know the page number of, you can do something like this:

CGPDFDocumentRef doc = ...;
CGPDFDictionaryRef outlinePageRef = ...;
for (int p=1; p<=CGPDFDocumentGetNumberOfPages(doc); p++) {
  CGPDFPageRef page = CGPDFDocumentGetPage(doc, p);
  if (page == outlinePageRef) {
    printf("found the page number: %i", p);
    break;
  }
}

An explicit destination however is not a page, but an array with the first element being the page. The other elements are the scroll position on the page etc.

like image 122
omz Avatar answered Oct 27 '22 01:10

omz