Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cropping a region from a PDF page with PDFBox

Tags:

pdf

pdfbox

I am trying to crop a region out of a PDF page programmatically. Specifically, my input is going to be a single page PDF and a bounding box on the page. Output is going to be a PDF that contains the characters, graphics paths and images from the original PDF, and it should look like the original PDF. In other words, I want a function that is similar to cropping a region out of an image, but with PDFs.

Three questions:

  1. Is it at all possible to do? From my knowledge of PDFs, it seems possible. But I'm no expert, so I would like to know first if there are some things I'm missing here.

  2. Is there any open source software for this?

  3. Can PDFBox do this currently? I couldn't find such a functionality but I might have missed it. Does anybody know of any attempt of doing this?

like image 202
rivu Avatar asked Oct 19 '22 13:10

rivu


1 Answers

1- Yes, this is called the crop box.

2- Yes, e.g. PDFBox.

3- Yes, just open a PDF, set a crop box, and save it:

PDDocument doc = PDDocument.load(new File(...));
PDPage page = doc.getPage(0);
page.setCropBox(new PDRectangle(20, 20, 200, 400));
doc.save(...);
doc.close();

The numbers in PDRectangle are user space units. 1 unit = 1/72 inches.

Note that the contents outside the cropbox are not gone, they are just hidden.

like image 86
Tilman Hausherr Avatar answered Oct 22 '22 03:10

Tilman Hausherr