Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Any tips for speeding up GhostScript?

I have a 100 page PDF that is about 50 MBs. I am running the script below against it and it's taking about 23 seconds per page. The PDF is a scan of a paper document.

gswin32.exe -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -dCompatibilityLevel=1.3 
            -dPDFSETTINGS=/screen -sOutputFile=out4.pdf 09.pdf

Is there anything I can do to speed this up? I've determined that the -dPDFSettings=/screen is what is making it so slow, but i'm not getting good compression without it...

UPDATE: OK I tried updating it to what I have below. Am i using the -c 30000000 setvmthreshold portion correctly?

gswin32.exe -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -dCompatibilityLevel=1.3 
            -dPDFSETTINGS=/screen -dNumRenderingThreads=2 -sOutputFile=out7.pdf 
            -c 30000000 setvmthreshold -f 09.pdf
like image 689
Abe Miessler Avatar asked Dec 28 '10 19:12

Abe Miessler


5 Answers

If you are on a multicore system, make it use multiple CPU cores with:

-dNumRenderingThreads=<number of cpus>

Let it use up to 30mb of RAM:

-c "30000000 setvmthreshold"

Try disabling the garbage collector:

-dNOGC

Fore more details, see Improving Performance section from Ghoscript docs.

like image 124
ismail Avatar answered Nov 14 '22 20:11

ismail


I was crunching a ~300 page PDF on a core i7 and found that adding the following options provided a significant speedup:

                            %-> comments to the right 
-dNumRenderingThreads=8     % increasing up to 64 didn't make much difference
-dBandHeight=100            % didn't matter much
-dBandBufferSpace=500000000 % (500MB)
-sBandListStorage=memory    % may or may not need to be set when gs is compiled
-dBufferSpace=1000000000    % (1GB)

The -c 1000000000 setnvmthreshold -f thing didn't make much difference for me, FWIW.

like image 33
wpgalle3 Avatar answered Nov 14 '22 20:11

wpgalle3


You don't say what CPU and what amount of RAM your computer is equipped with.

Your situation is this:

  • A scanned document as PDF, sized about 500 kB per page on avarage. That means each page basically is a picture, using the scan resolution (at least 200 dpi, maybe even 600 dpi).
  • You are re-distilling it with Ghostscript, using -dPDFSETTINGS=/screen. This setting will do quite a few things to make the file size smaller. Amongst the most important are:
    1. Re-sample all (color or grayscale) images to 72dpi
    2. Convert all colors to sRGB

Both these operations can quite "expensive" in terms of CPU and/or RAM usage.

BTW, your setting of -dCompatibilityLevel=1.3 is not required; it's already implicitely set by -dPDFSETTINGS=/screen already.

Try this:

gswin32.exe ^
 -o output.pdf ^
 -sDEVICE=pdfwrite ^
 -dPDFSETTINGS=/screen ^
 -dNumRenderingThreads=2 ^
 -dMaxPatternBitmap=1000000 ^
 -c "60000000 setvmthreshold" ^
 -f input.pdf

Also, if you are on a 64bit system, try to install the most recent 32bit Ghostscript version (9.00). It performs better than the 64bit version.

Let me tell you that downsampling a 600dpi scanned page image to 72dpi usually does not take 23 seconds for me, but less than 1.

like image 7
Kurt Pfeifle Avatar answered Nov 14 '22 20:11

Kurt Pfeifle


To speed up rasterizing a pdf with large bitmap graphics to a high-quality 300 ppi png image, I found that setting -dBufferSpace as high as possible and -dNumRenderingThreads to as many cores as available was the most effective for most files, with -dBufferSpace providing the most significant lift.

The specific values that worked the best were:

  • -dBufferSpace=2000000000 for 2 gigabytes of buffer space. This took the rasterization of one relatively small file from 14 minutes to just 50 seconds. For smaller files, there wasn't much difference from setting this to 1 gigabyte, but for larger files, it made a significant difference (sometimes 2x faster). Trying to go to 3 gigabytes or above for some reason resulted in an error on startup "Unrecoverable error: rangecheck in .putdeviceprops".

  • -dNumRenderingThreads=8 for a machine with 8 cores. This took the rasterization of that same file from 14 minutes to 4 minutes (and 8 minutes if using 4 threads). Combining this with the -dBufferSpace option above took it from 50 seconds to 25 seconds. When combined with -dBufferSpace however, there appeared to be diminishing returns as the number threads were increased, and for some files there was little effect at all. Strangely for some larger files, setting the number of threads to 1 was actually faster than any other number.

The command overall looked like:

gs -sDEVICE=png16m -r300 -o document.png -dNumRenderingThreads=8 -dBufferSpace=2000000000 -f document.pdf

This was tested with Ghostscript 9.52, and came out of testing the suggestions in @wpgalle3's answer as well as the Improving performance section in the Ghostscript documentation.

A key takeaway from the documentation was that when ghostscript uses "banding mode" due to the raster image output being larger than the value for -dMaxBitmap, it can take advantage of multiple cores to speed up the process.

Options that were ineffective or counterproductive:

Setting -c "2000000000 setvmthreshold" (2 gigabytes) either alone or with -dBufferSpace didn't appear to make a difference.

Setting -sBandListStorage=memory resulted in a segmentation fault.

Setting -dMaxBitmap=2000000000 (2 gigabytes) significantly slowed down the process and apparently caused it to go haywire, writing hundreds of gigabytes of temporary files without any sign of stopping, prompting me to kill the process short.

Setting -dBandBufferSpace to half of -dBufferSpace didn't make a difference for smaller files, but actually slowed down the process rather significantly for larger files by 1.5-1.75x. In the Banding parameters section of the Ghostscript documentation, it's actually suggested not to use -dBandBufferSpace: "if you only want to allocate more memory for banding, to increase band size and improve performance, use the BufferSpace parameter, not BandBufferSpace."

like image 4
Roman Scher Avatar answered Nov 14 '22 21:11

Roman Scher


I may be complete out of place here, but have you given a try to the Djvu file format ? It works like a charm for scanned documents in general (even if there are lots of pictures), and it gives much better compressed files: I get a factor of two lossless gain in size in general on B&W scientific articles.

like image 1
Vincent Fourmond Avatar answered Nov 14 '22 21:11

Vincent Fourmond