Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PyPdf2 nested bookmarks with same name not working

When you try and nest several bookmarks with the same name, PyPDF2 does not take it into account. Below self-contained python code to test what I mean (you need at have 3 pdf files named a, b and c in the working folder to test it out)

from PyPDF2 import PdfFileReader, PdfFileMerger


def main():
    merger = PdfFileMerger()
    first_one = True
    for file in ["a.pdf", "b.pdf", "c.pdf"]:
        print("next row")
        reader = PdfFileReader(file)
        merger.append(reader)
        if first_one:
            child = merger.addBookmark(title="blabla", pagenum=1)
            first_one = False
        else:
            child = merger.addBookmark(title="blabla", pagenum=1, parent=child)

    merger.write("test.pdf")


if __name__ == "__main__":
    main()

I would expect the resulting pdf to have three levels of nested bookmarks

blabla
    blabla
        blabla

but instead I get

blabla
    blabla
    blabla

Is there any way to make sure this does not happen ?

EDIT : I have removed the pagenum variable as I want those 3 bookmarks to point to the same page.

like image 874
Chapo Avatar asked Mar 22 '17 02:03

Chapo


1 Answers

This seems to be a bug with PdfFileMerger.addBookmark() method. There is some detail here

Below is a work-around using PdfFileWriter and its addBookmark() method. Using this I can get 3 nested bookmarks, with same name, all on the same page:

blabla
    blabla
        blabla

Code using PdfFileWriter work-around:

from PyPDF2 import PdfFileReader, PdfFileWriter


def main():
    writer = PdfFileWriter()
    pagenum = 0
    first_one = True
    for file in ["a.pdf", "b.pdf", "c.pdf"]:
        print("next row")
        reader = PdfFileReader(file)
        writer.appendPagesFromReader(reader)
        if first_one:
            child = writer.addBookmark(
                title="blabla", pagenum=pagenum, parent=None
            )
            first_one = False
        else:
            child = writer.addBookmark(
                title="blabla", pagenum=pagenum, parent=child
            )

    with open("test.pdf", "wb") as d:
        writer.write(d)


if __name__ == "__main__":
    main()

Alternatively, I had a go at modifying the PyPDF2 library to resolve this issue, although I'm not very experienced at python so may have introduced new/other issues! Have submitted a pull-request to the maintainers, but until then you could clone my fork, and install PyPDF2 from there:

git clone https://github.com/khalida/PyPDF2.git
cd PyPDF2
python setup.py sdist
sudo -H pip uninstall -y PyPDF2
sudo -H pip install dist/PyPDF2-1.26.0.tar.gz

After that you should be able to get the nesting you want from PdfFileMerger.addBookmark(). I've tested it for the case above, but haven't done any testing beyond that.

like image 105
kabdulla Avatar answered Oct 12 '22 21:10

kabdulla