I have a 100 page PDF that is about 50 MBs. I am running the script below against it and it's taking about 23 seconds per page. The PDF is a scan of a paper document.
gswin32.exe -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -dCompatibilityLevel=1.3
-dPDFSETTINGS=/screen -sOutputFile=out4.pdf 09.pdf
Is there anything I can do to speed this up? I've determined that the -dPDFSettings=/screen
is what is making it so slow, but i'm not getting good compression without it...
UPDATE:
OK I tried updating it to what I have below. Am i using the -c 30000000 setvmthreshold
portion correctly?
gswin32.exe -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -dCompatibilityLevel=1.3
-dPDFSETTINGS=/screen -dNumRenderingThreads=2 -sOutputFile=out7.pdf
-c 30000000 setvmthreshold -f 09.pdf
If you are on a multicore system, make it use multiple CPU cores with:
-dNumRenderingThreads=<number of cpus>
Let it use up to 30mb of RAM:
-c "30000000 setvmthreshold"
Try disabling the garbage collector:
-dNOGC
Fore more details, see Improving Performance
section from Ghoscript docs.
I was crunching a ~300
page PDF on a core i7
and found that adding the following options provided a significant speedup:
%-> comments to the right
-dNumRenderingThreads=8 % increasing up to 64 didn't make much difference
-dBandHeight=100 % didn't matter much
-dBandBufferSpace=500000000 % (500MB)
-sBandListStorage=memory % may or may not need to be set when gs is compiled
-dBufferSpace=1000000000 % (1GB)
The -c 1000000000 setnvmthreshold -f
thing didn't make much difference for me, FWIW.
You don't say what CPU and what amount of RAM your computer is equipped with.
Your situation is this:
-dPDFSETTINGS=/screen
. This setting will do quite a few things to make the file size smaller. Amongst the most important are:
Both these operations can quite "expensive" in terms of CPU and/or RAM usage.
BTW, your setting of -dCompatibilityLevel=1.3
is not required; it's already implicitely set by -dPDFSETTINGS=/screen
already.
Try this:
gswin32.exe ^
-o output.pdf ^
-sDEVICE=pdfwrite ^
-dPDFSETTINGS=/screen ^
-dNumRenderingThreads=2 ^
-dMaxPatternBitmap=1000000 ^
-c "60000000 setvmthreshold" ^
-f input.pdf
Also, if you are on a 64bit system, try to install the most recent 32bit Ghostscript version (9.00). It performs better than the 64bit version.
Let me tell you that downsampling a 600dpi scanned page image to 72dpi usually does not take 23 seconds for me, but less than 1.
To speed up rasterizing a pdf with large bitmap graphics to a high-quality 300 ppi png image, I found that setting -dBufferSpace
as high as possible and -dNumRenderingThreads
to as many cores as available was the most effective for most files, with -dBufferSpace
providing the most significant lift.
The specific values that worked the best were:
-dBufferSpace=2000000000
for 2 gigabytes of buffer space. This took the rasterization of one relatively small file from 14 minutes to just 50 seconds. For smaller files, there wasn't much difference from setting this to 1 gigabyte, but for larger files, it made a significant difference (sometimes 2x faster). Trying to go to 3 gigabytes or above for some reason resulted in an error on startup "Unrecoverable error: rangecheck in .putdeviceprops".
-dNumRenderingThreads=8
for a machine with 8 cores. This took the rasterization of that same file from 14 minutes to 4 minutes (and 8 minutes if using 4 threads). Combining this with the -dBufferSpace
option above took it from 50 seconds to 25 seconds. When combined with -dBufferSpace
however, there appeared to be diminishing returns as the number threads were increased, and for some files there was little effect at all. Strangely for some larger files, setting the number of threads to 1 was actually faster than any other number.
The command overall looked like:
gs -sDEVICE=png16m -r300 -o document.png -dNumRenderingThreads=8 -dBufferSpace=2000000000 -f document.pdf
This was tested with Ghostscript 9.52, and came out of testing the suggestions in @wpgalle3's answer as well as the Improving performance section in the Ghostscript documentation.
A key takeaway from the documentation was that when ghostscript uses "banding mode" due to the raster image output being larger than the value for -dMaxBitmap
, it can take advantage of multiple cores to speed up the process.
Options that were ineffective or counterproductive:
Setting -c "2000000000 setvmthreshold"
(2 gigabytes) either alone or with -dBufferSpace
didn't appear to make a difference.
Setting -sBandListStorage=memory
resulted in a segmentation fault.
Setting -dMaxBitmap=2000000000
(2 gigabytes) significantly slowed down the process and apparently caused it to go haywire, writing hundreds of gigabytes of temporary files without any sign of stopping, prompting me to kill the process short.
Setting -dBandBufferSpace
to half of -dBufferSpace
didn't make a difference for smaller files, but actually slowed down the process rather significantly for larger files by 1.5-1.75x. In the Banding parameters section of the Ghostscript documentation, it's actually suggested not to use -dBandBufferSpace
: "if you only want to allocate more memory for banding, to increase band size and improve performance, use the BufferSpace parameter, not BandBufferSpace."
I may be complete out of place here, but have you given a try to the Djvu file format ? It works like a charm for scanned documents in general (even if there are lots of pictures), and it gives much better compressed files: I get a factor of two lossless gain in size in general on B&W scientific articles.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With