Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ImageMagick Split PDF Output File Name Always Starts at Zero

I run the following command to split a PDF in ImageMagick:

convert file.pdf[5-10] file.png

The resulting output files are always suffixed starting with zero. That is:

file-0.png, file-1.png, file-2.png...

Any ideas what I might be doing wrong? The documentation states that the files should be suffixed starting at 5, matching the page numbers of the pages extracted.

like image 666
Csizzle Avatar asked Feb 18 '15 14:02

Csizzle


People also ask

Can ImageMagick split TIF files into individual images?

- ImageMagick Split multipage TIF,TIFF images into individual images and RETAIN original filename after conversion. Questions and postings pertaining to the usage of ImageMagick regardless of the interface. This includes the command-line utilities, as well as the C and C++ APIs.

Is it possible to reference a bunch of files with ImageMagick?

I've read a number of references talking about sh scripts, but as far as I understand, it must be possible to reference a bunch of files by using just ImageMagick commands. But I don't know how neither!

How do I convert a PDF file to an image?

There is a quick and convenient way to convert PDF to one or more images. Command line tool ImageMagick does that (and a lot more). You can convert an entire PDF document to a single image, or, if you like, there is an option to output pages as a series of enumerated image files.

What do you think about the ImageMagick extension?

PHP ImageMagick extension is a very powerful tool but, imho, it’s not very well documented (like most PHP features). Probably, a more complete guide could allow to develop a lot of code without need to include external libraries that adds complexity, maintenance issues, etc.


2 Answers

I ended up solving this by using the -scene # command line parameter.

This causes the output to begin at the desired index. For posterity:

convert file.pdf -scene 5 file-%d.png
like image 132
Csizzle Avatar answered Oct 06 '22 10:10

Csizzle


You see the result you describe because ImageMagick's page count for multi-page image formats is zero-based: Page 1 will have index 0, page 2 will have index 1, etc.

Also, ImageMagick cannot process PDF input files itself: it employs Ghostscript as its 'delegate' -- Ghostscript consumes the PDF first and emits a raster file for each PDF page. Only these raster files are then processed by ImageMagick.

Depending on your exact ImageMagick version and IM setup, this may result in an indirect PNG output generation, and the conversion chain may look like this:

PDF --> PPM (portable pixmap) --> PNG
     ^                         ^
     |                         |
     |                         +-- (handled by ImageMagick)
     +-- (handled by Ghostscript)

If you are unlucky, the result will be slow and the quality may not be as good as it could be.

To verify what exactly happens in a convert a.pdf a.png command, you can add the -verbose parameter. That will show you the Ghostscript command being employed by IM to process the PDF input:

convert -verbose a.pdf a.png

 /var/tmp/magick-15951W3TZ3WRpwIUk1 PNG 612x792 612x792+0+0 8-bit sRGB 3.73KB 0.000u 0:00.000
 a.pdf PDF 612x792 612x792+0+0 16-bit sRGB 3.73KB 0.000u 0:00.000
 a.pdf=>a.png PDF 612x792 612x792+0+0 8-bit sRGB 2c 2.95KB 0.000u 0:00.000

 [ghostscript library] -q -dQUIET -dSAFER -dBATCH -dNOPAUSE -dNOPROMPT \
   -dMaxBitmap=500000000 -dAlignToPixels=0 -dGridFitTT=2 "-sDEVICE=pngalpha" \
   -dTextAlphaBits=4 -dGraphicsAlphaBits=4 "-r72x72" \
  "-sOutputFile=/var/tmp/magick-15951W3TZ3WRpwIUk%d" \
  "-f/var/tmp/magick-15951nJD8-fF8kA7j" \
  "-f/var/tmp/magick-15951JTZDMwtEswHn"

(As you can see, my IM installation is set up to do a PDF->PNG conversion without the detour via PPM... Your mileage may vary.)

You may get better results when using Ghostscript directly, instead of running an IM convert command. (If ImageMagick works at all with PDF->PNG conversion, you have a working Ghostscript installation for sure.) So you can try this:

gs                  \
 -o file-%03d.png   \
 -sDEVICE=pngalpha  \
  file.pdf

The -%03d file name suffix will cause Ghostscript to output file-001.png, file-002.png, file-003.png.

However, if you are unlucky and have an older version of Ghostscript installed, the file name will also start with a file-000 one...

In any case, since your sample command seems to suggest that you want to convert only a page range (5--10) from the PDF file (not all pages), here is the command to use:

gs                  \
 -o file-%03d.png   \
 -sDEVICE=pngalpha  \
 -dFirstPage=5      \
 -dLastPage=10      \
  file.pdf

But the bad news here is: Ghostscript will STILL start with naming the output files as file-001.png (page 5) ... file-005.png (page 10).

To work around that, you'll have to generated the PNGs for the first 4 pages too, and later delete them again:

gs                  \
 -o file-%03d.png   \
 -sDEVICE=pngalpha  \
 -dFirstPage=1      \
 -dLastPage=10      \
  file.pdf

rm -rf file-00{1,2,3,4}.png
like image 40
Kurt Pfeifle Avatar answered Oct 06 '22 10:10

Kurt Pfeifle