Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Change metadata of pdf file with pypdf

I'd like to create/modify the title of a pdf document using pypdf. It seems that the title is readonly. Is there a way to access this metadata r/w?

If answer positive, a piece of code would be appreciated.

Thanks

like image 599
Baudouin Tamines Avatar asked Apr 04 '10 14:04

Baudouin Tamines


1 Answers

You can manipulate the title with pyPDF (sort of). I came across this post on the reportlab-users listing:

http://two.pairlist.net/pipermail/reportlab-users/2009-November/009033.html

You can also use pypdf. http://pybrary.net/pyPdf/

This won't let you edit the metadata per se, but will let you read one or more pdf file(s) and spit them back out, possibly with new metadata.

Here's the relevant code:

from pyPdf import PdfFileWriter, PdfFileReader
from pyPdf.generic import NameObject, createStringObject

OUTPUT = 'output.pdf'
INPUTS = ['test1.pdf', 'test2.pdf', 'test3.pdf']

# There is no interface through pyPDF with which to set this other then getting
# your hands dirty like so:
infoDict = output._info.getObject()
infoDict.update({
    NameObject('/Title'): createStringObject(u'title'),
    NameObject('/Author'): createStringObject(u'author'),
    NameObject('/Subject'): createStringObject(u'subject'),
    NameObject('/Creator'): createStringObject(u'a script')
})

inputs = [PdfFileReader(i) for i in INPUTS]
for input in inputs:
    for page in range(input.getNumPages()):
        output.addPage(input.getPage(page))

outputStream = file(OUTPUT, 'wb')
output.write(outputStream)
outputStream.close()
like image 134
Mark Lavin Avatar answered Oct 20 '22 09:10

Mark Lavin