Extract a region of a PDF page by coordinates

Question

I am looking for a tool to extract a given rectangular region (by coordinates) of a 1-page PDF file and produce a 1-page PDF file with the specified region:

# in.pdf is a 1-page pdf file
extract file.pdf 0 0 100 100 > out.pdf
# out.pdf is now a 1-page pdf file with a page of size 100x100
# it contains the region (0, 0) to (100, 100) of file.pdf

I could convert the PDF to an image and use convert, but this would mean that the resulting PDF would not be vectorial anymore, which is not acceptable (I want to be able to zoom).

I would ideally like to perform this task with a command-line tool or a Python library.

Thanks!

Steven · Accepted Answer

using pyPdf, you could do something like this:

import sys
import pyPdf

def extract(in_file, coords, out_file):
    with open(in_file, 'rb') as infp:
        reader = pyPdf.PdfFileReader(infp)
        page = reader.getPage(0)
        writer = pyPdf.PdfFileWriter()
        page.mediaBox.lowerLeft = coords[:2]
        page.mediaBox.upperRight = coords[2:]
        # you could do the same for page.trimBox and page.cropBox
        writer.addPage(page)
        with open(out_file, 'wb') as outfp:
            writer.write(outfp)

if __name__ == '__main__':
    in_file = sys.argv[1]
    coords = [int(i) for i in sys.argv[2:6]]
    out_file = sys.argv[6]

    extract(in_file, coords, out_file)

Extract a region of a PDF page by coordinates

Tags:

python

command-line

pdf

extract

crop

a3nm

Video Answer

1 Answers

Steven

Recent Activity

Donate For Us

Extract a region of a PDF page by coordinates

Tags:

python

command-line

pdf

extract

crop

a3nm

Video Answer

1 Answers

Steven

Related questions

Recent Activity

Donate For Us