Create outlines/TOC for existing PDF in Python

Tags:

I'm using pyPdf to merge several PDF files into one. This works great, but I would also need to add a table of contents/outlines/bookmarks to the PDF file that is generated.

pyPdf seems to have only read support for outlines. Reportlab would allow me to create them, but the opensource version does not support loading PDF files, so that doesn't work to add outlines to an existing file.

Is there any way I can add outlines to an existing PDF using Python, or any library that would allow that?

890

asked May 27 '11 20:05

jphoude

3 Answers

https://github.com/yutayamamoto/pdfoutline I made a python library just for adding an outline to an existing PDF file.

answered Oct 05 '22 23:10

Yuta

It looks like PyPDF2 can do the job. See the addBookmark method in the documentation: https://pythonhosted.org/PyPDF2/PdfFileMerger.html

answered Oct 06 '22 01:10

Watusimoto

We had a similar problem in WeasyPrint: cairo produces the PDF files but does not support bookmarks/outlines or hyperlinks. In the end we bit the bullet, read the PDF spec, and did it ourselves.

WeasyPrint’s pdf.py has a simple PDF parser and writer that can add/override PDF "objects" to an existing documents. It uses the PDF "update" mechanism and only append at the end of the file.

This module was made for internal use only but I’m open to refactoring it to make it easier to use in other projects.

However the parser takes a few shortcuts and can not parse all valid PDF files. It may need to be adapted if PyPDF’s output is not as nice as cairo’s. From the module’s docstring:

Rather than trying to parse any valid PDF, we make some assumptions that hold for cairo in order to simplify the code:

All newlines are '\n', not '\r' or '\r\n'

Except for number 0 (which is always free) there is no "free" object.

Most white space separators are made of a single 0x20 space.

Indirect dictionary objects do not contain '>>' at the start of a line except to mark the end of the object, followed by 'endobj'. (In other words, '>>' markers for sub-dictionaries are indented.)

The Page Tree is flat: all kids of the root page node are page objects, not page tree nodes.

answered Oct 06 '22 00:10

Simon Sapin

Related questions
                            
                                OpenID login on local development server for google app engine
                            
                                Help me write my LISP :) LISP environments, Ruby Hashes
                            
                                Retrieve wall-time in Python using the standard library?
                            
                                Auto expanding blocks of comments in emacs
                            
                                Are there any good tutorial about using buildout and pip? [closed]
                            
                                Filling complements of areas with matplotlib
                            
                                python: sampling without replacement from a 2D grid
                            
                                How to create Sphinx-based documentation in a Jython project?
                            
                                python argparse subcommand with dependency and conflict
                            
                                Simple curve smoothing in matplotlib --- equivalent to gnuplot's "smooth bezier"?
                            
                                Python: Save dynamically created object types
                            
                                pointwise operations on scipy.sparse matrices
                            
                                python synthesize midi with fluidsynth
                            
                                How to call javascript function from PyQT
                            
                                How to use mercurial (Hg) within a Python application?
                            
                                Reading fortran double precision format into python
                            
                                kill a function after a certain time in windows
                            
                                How do I run cleanup code in a Python multiprocessing Pool?
                            
                                RestructuredText - Hyperlinks without leading and trailing spaces
                            
                                Treemap visualisation view for Python profiler output?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Create outlines/TOC for existing PDF in Python

Tags:

python

pdf

pypdf

reportlab