I am building a simple script which converts the first page of a PDF into an image using Ghostscript. Here is the command I use:
gs -q -o output.png -sDEVICE=pngalpha -dLastPage=1 input.pdf
This works beautifully with some PDFs, e.g. if I convert the first page of a PDF that looks like this:
I actually get this first page as an image and there aren't any problems.
But I have noticed that with some first pages of other PDFs, like the following:
With the same gs
command, after the conversion, the .png image looks like this:
The problem is that I get this extra white space on the left inside the image when I convert that page, why does GhostsScript do this? Where does that extra blank white space come from?
To be completely safe from Ghostscript exploits, users need to disable support for other formats -- such as the popular PDF format --because those files can also embed malicious PostScript code, which Ghostscript would execute, as well. The use of ImageMagick with an enabled Ghostscript back end is widespread.
Ghostscript is capable of interpreting PostScript, encapsulated PostScript (EPS), DOS EPS (EPSF), and -- if the executable was built for it -- Adobe Portable Document Format (PDF). The interpreter reads and executes the files in sequence, using the method described under "File searching" to find them.
The option -c quit will execute the quit command and therefor terminate GhostScript. You can execute arbitrary commands with -c . So you can just append quit to the commands you pass for the conversion, Or you can feed the commands on stdin and GhostScript will terminate at end-of-file.
Most likely, your PDFs do not use identical values for /MediaBox
and for /CropBox
. For details about these technical terms related to a page, see this illustration from the German Wikipedia:
In other words: the /CropBox
values (if given) for a PDF page determines which (smaller) part of the overall page information (which is inside the /MediaBox
) the PDF viewer should be made visible to the user (or to the printer).
To determine what are the different values for all the pages of your book(s), run this command:
pdfinfo -f 1 -l 1000 -box my.pdf
To see these values just for the first page, run
pdfinfo -l 1 -box my.pdf
For Ghostscript to give the results you want, add -dUseCropBox
to your command line:
gs -q -o output.png -sDEVICE=pngalpha -dLastPage=1 -dUseCropBox input.pdf
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With