Change metadata of pdf file with pypdf2

4 Answers

I was surprised to see there is no code sample for PyPDF2 when the questions is explicitly asking for PyPDF2, so here it is:

from PyPDF2 import PdfFileReader, PdfFileWriter

reader = PdfFileReader("source.pdf")
writer = PdfFileWriter()

writer.appendPagesFromReader(reader)
metadata = reader.getDocumentInfo()
writer.addMetadata(metadata)

# Write your custom metadata here:
writer.addMetadata({"/Some": "Example"})

with open("result.pdf", "wb") as fp:
    writer.write(fp)

answered Oct 16 '22 18:10

Cyril N.

You can do that using pdfrw

pip install pdfrw

Then run

from pdfrw import PdfReader, PdfWriter   
trailer = PdfReader("myfile.pdf")    
trailer.Info.WhoAmI = "Tarun Lalwani"    
PdfWriter("edited.pdf", trailer=trailer).write()

And then check the PDF Custom Properties

EditedProperties

answered Oct 16 '22 18:10

Tarun Lalwani

The correct way to edit PDF metadata in Python

There are several ways to edit PDF metadata in Python, but one way is better than the others.

I will start by talking about other ways that seem right but have side effects. Skip to the end of this article if you don’t have enough time and just use the correct way.

Weakness is package not maintained.

from pdfrw import PdfReader, PdfWriter, PdfDict

if __name__ == '__main__':
    pdf_reader = PdfReader('old.pdf')
    metadata = PdfDict(Author='Someone', Title='PDF in Python')
    pdf_reader.Info.update(metadata)
    PdfWriter().write('new.pdf', pdf_reader)

pdfrw can do quite easily without losing non-display information such as bookmarks.

PyPDF2 supports more PDF features than pdfrw, including decryption and more types of decompression.

Weakness is PDF not preserve outlines(bookmarks).

import pprint

from PyPDF2 import PdfFileReader, PdfFileWriter

if __name__ == '__main__':
    file_in = open('old.pdf', 'rb')
    pdf_reader = PdfFileReader(file_in)
    metadata = pdf_reader.getDocumentInfo()
    pprint.pprint(metadata)

    pdf_writer = PdfFileWriter()
    pdf_writer.appendPagesFromReader(pdf_reader)
    pdf_writer.addMetadata({
        '/Author': 'Someone',
        '/Title': 'PDF in Python'
    })
    file_out = open('new.pdf', 'wb')
    pdf_writer.write(file_out)

    file_in.close()
    file_out.close()

Using PdfFileWriter create a new PDF, and get old contents through appendPagesFromReader(), then addMetadata().

It seems that we cannot directly modify the PDF metadata, so we add all pages and metadata then write out to a new file.

The correct way to edit PDF metadata in Python.

import pprint

from PyPDF2 import PdfFileReader, PdfFileMerger

if __name__ == '__main__':
    file_in = open('old.pdf', 'rb')
    pdf_reader = PdfFileReader(file_in)
    metadata = pdf_reader.getDocumentInfo()
    pprint.pprint(metadata)

    pdf_merger = PdfFileMerger()
    pdf_merger.append(file_in)
    pdf_merger.addMetadata({
        '/Author': 'Someone',
        '/Title': 'PDF in Python'
    })
    file_out = open('new.pdf', 'wb')
    pdf_merger.write(file_out)

    file_in.close()
    file_out.close()

Using PdfFileMerger concatenate pages through append().

append(fileobj, bookmark=None, pages=None, import_bookmarks=True)

import_bookmarks (bool) – You may prevent the source document’s bookmarks from being imported by specifying this as False.

References

pdfrw: the other Python PDF library
Reading and writing pdf metadata

answered Oct 16 '22 16:10

conan

Building on what Cyril N. stated, the code works fine, but it creates a lot of "trash" files since now you have the original file and the file with the metadata.

I changed the code a bit since I will run this on hundreds of files a day, and don't want to deal with the additional clean-up:

from PyPDF2 import PdfFileReader, PdfFileWriter

reader = PdfFileReader("your_original.pdf")
writer = PdfFileWriter()

writer.appendPagesFromReader(reader)
metadata = reader.getDocumentInfo()
writer.addMetadata(metadata)

# Write your custom metadata here:
writer.addMetadata({"/Title": "this"})

with open("your_original.pdf", "ab") as fout:
    # ab is append binary; if you do wb, the file will append blank pages
    writer.write(fout)

If you do want to have it as a new file, just use a different name for the pdf in fout and keep ab. If you use wb, you will append blank pages equal to your original file.

answered Oct 16 '22 16:10

Duncan_Idaho

Related questions
                            
                                Maketrans not working for petl with python3.4
                            
                                how to know python semaphore value
                            
                                Unable to install python pip on Ubuntu 14.04
                            
                                page number in pdf converted from html - pdfkit, python/django
                            
                                How to set only one word in a specific color using Click.secho
                            
                                Using Pytube to download playlist from YouTube
                            
                                Can anyone point out the pros and cons of TG2 over Django?
                            
                                How to get process's grandparent id
                            
                                negative pow in python
                            
                                How to copy and paste by using Keyboard in python? [closed]
                            
                                How do I change the widget type of the DELETE field in a django formset
                            
                                How to automatically capitalize field on form submission in Django?
                            
                                How can I create lists from a list of strings? [duplicate]
                            
                                Average of two consecutive elements in the list in Python
                            
                                Prevent other classes' methods from calling my constructor
                            
                                Python Requests - Use navigate site by servers IP
                            
                                Debugging django-allauth Social Network Login Failure
                            
                                print a list of dictionaries in table form
                            
                                How to convert millisecond time stamp to normal date in Python?
                            
                                Atom IDE autocomplete-python not working

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Change metadata of pdf file with pypdf2

Tags:

python

pdf

pypdf2

pdf-manipulation

guettli

People also ask