Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Combining PDF with GhostScript: Using Original Bookmarks with corrected page numbers

I am using

gs -dBATCH -dNOPAUSE -sDEVICE=pdfwrite -sOutputFile=book.pdf  -f front-matter.pdf fulltext-0.pdf fulltext-1.pdf back-matter.pdf

to create a single PDF document from a series of pdf documents. I was going to include a new made-up table of content and include it using the pdfmark mechanism. Then I notice that the original files already have bookmarks in them - they are however referenced to the original page numbers, not the ones in the combined document.

I am looking for two possible solutions. Remove the orginal bookmarks or make use of the original bookmarks but somehow update their page references...

like image 909
DrSAR Avatar asked Nov 09 '11 19:11

DrSAR


2 Answers

As so often the case, someone has walked the same path before you...

unfolding disasters has worked out a solution to this very problem. His python script pdf-merge.py first invokes pdftk with its dump_data switch to retrieve all the pdfmark information. It then keeps track of the total number of pages for each merged document and does the math to offset the new page number pointer in the pdfmark instruction by the sum total of page counts of all the PDF documents included before the current PDF document. So it is close but not the same as the 2-pass approach of KenS. It first discovers bookmarks using pdftk and then creates a new bookmark file with correct page numbers. It also manages to turn the original pdfmark instruction (that would normally be preserved by gs into noop). I won't pretend I understand how that last part worked ...

However, the script does all I need including the option of tweaking the bookmark file before the final writing. Very neat and hat tip to Trevor King.

like image 92
DrSAR Avatar answered Jun 18 '23 16:06

DrSAR


In general pdfwrite doesn't know you are appending files, so it preserves bookmark and other 'metadata' information on the assumption that you will want it in the output.

However, when you are combining PDF files, preserving the information won't work, as the page numbers for the second and subsequent files will be incorrect.

So you need a 2-pass approach, first merge all the files, discarding the bookmarks, then 'convert' the merged file and add pdfmarks to set the correct bookmarks.

There is currently no option (with pdfwrite) to not preserve bookmarks. You will need to modify the Ghostscript PDF interpreter PostScript files to achieve this I think. You might try setting -dDOPDFMARKS=false, but I doubt that will work.

like image 23
KenS Avatar answered Jun 18 '23 15:06

KenS