Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

When converting first page of a PDF into an image using Ghostscript, sometimes I get "extra" space. Why?

I am building a simple script which converts the first page of a PDF into an image using Ghostscript. Here is the command I use:

gs -q -o output.png -sDEVICE=pngalpha -dLastPage=1 input.pdf 

This works beautifully with some PDFs, e.g. if I convert the first page of a PDF that looks like this:

enter image description here

I actually get this first page as an image and there aren't any problems.

But I have noticed that with some first pages of other PDFs, like the following:

enter image description here

With the same gs command, after the conversion, the .png image looks like this:

enter image description here

The problem is that I get this extra white space on the left inside the image when I convert that page, why does GhostsScript do this? Where does that extra blank white space come from?

like image 604
tonix Avatar asked Jan 05 '15 12:01

tonix


People also ask

Is it safe to use Ghostscript?

To be completely safe from Ghostscript exploits, users need to disable support for other formats -- such as the popular PDF format --because those files can also embed malicious PostScript code, which Ghostscript would execute, as well. The use of ImageMagick with an enabled Ghostscript back end is widespread.

How does Ghostscript work?

Ghostscript is capable of interpreting PostScript, encapsulated PostScript (EPS), DOS EPS (EPSF), and -- if the executable was built for it -- Adobe Portable Document Format (PDF). The interpreter reads and executes the files in sequence, using the method described under "File searching" to find them.

How do I stop Ghostscript?

The option -c quit will execute the quit command and therefor terminate GhostScript. You can execute arbitrary commands with -c . So you can just append quit to the commands you pass for the conversion, Or you can feed the commands on stdin and GhostScript will terminate at end-of-file.


1 Answers

Most likely, your PDFs do not use identical values for /MediaBox and for /CropBox. For details about these technical terms related to a page, see this illustration from the German Wikipedia:

In other words: the /CropBox values (if given) for a PDF page determines which (smaller) part of the overall page information (which is inside the /MediaBox) the PDF viewer should be made visible to the user (or to the printer).

Solution

To determine what are the different values for all the pages of your book(s), run this command:

pdfinfo -f 1 -l 1000 -box my.pdf

To see these values just for the first page, run

pdfinfo -l 1 -box my.pdf

For Ghostscript to give the results you want, add -dUseCropBox to your command line:

gs -q -o output.png -sDEVICE=pngalpha -dLastPage=1 -dUseCropBox input.pdf 
like image 193
Kurt Pfeifle Avatar answered Nov 15 '22 05:11

Kurt Pfeifle