Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Editing PDF attributes using sed

I'm trying to develop a python script for blender to output a rendered image sequence to a PDF. I am using Imagemagick to convert to PDF, that part is working fine, However, I want the thumbnail preview to also be included in the PDF.

The PDF format is a bit confusing to me, but I have found the /PageMode and /UseThumbs tags and how to insert them properly into the file. I can do this manually and it works pretty well. But I have been trying to get a similar result without the need to do it manually, I am writing a script after all. Here is an example snippet of the header data in the PDF, with the added tags:

%PDF-1.3 
1 0 obj
<<
/Pages 2 0 R
/PageMode
/UseThumbs
/Type /Catalog
>>
endobj
2 0 obj
<<
/Type /Pages
/Kids [ 3 0 R 17 0 R 31 0 R ]
/Count 3
>>

I am trying to use sed to insert the tags as needed on the 4th and 5th lines, Which is also working, but when I open the PDF, the images are corrupted. cryptically, when I compare the manually edited PDF (which isn't corrupted) to the sed edited PDF(which is corrupted) in notepad++, there is no difference in the files, that I can find. There is a different character count, but I cannot find the location of the difference

I understand that PDFs have an offset cross reference table, but it seems strange to me that doing it by hand doesn't corrupt anything, but doing it with sed creates corruption

What am I doing wrong?

like image 859
user3203567 Avatar asked Nov 02 '22 07:11

user3203567


1 Answers

You really don't want to be doing this from sed. Some PDFs may look like line-oriented text files, but they most assuredly are not.

Since you're already using Python, you can use a Python library for this task.

pdfrw will do this fine for you from pure Python. It will slurp in a PDF file, and rebuild it with whatever changes you want and set file offsets correctly. The following snippet of code should set /PageMode to /UseThumbs in the /Root dictionary of the PDF:

    from pdfrw import PdfReader, PdfWriter, PdfName

    trailer = PdfReader('myfile.pdf')
    trailer.Root.PageMode = PdfName.UseThumbs
    PdfWriter().write('mynewfile.pdf', trailer)

Disclaimer: I am the pdfrw author.

like image 107
Patrick Maupin Avatar answered Nov 15 '22 05:11

Patrick Maupin