Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ImageMagick: how to achieve low memory usage while resizing a large number of image files?

I would like to resize a large number (about 5200) of image files (PPM format, each 5 MB in size) and save them to PNG format using convert.

Short version:

convert blows up 24 GB of memory although I use the syntax that tells convert to process image files consecutively.

Long version:

Regarding more than 25 GB of image data, I figure I should not process all files simultaneously. I searched the ImageMagick documentation about how to process image files consecutively and I found:

It is faster and less resource intensive to resize each image it is read:

$ convert '*.jpg[120x120]' thumbnail%03d.png

Also, the tutorial states:

For example instead of...

montage '*.tiff' -geometry 100x100+5+5 -frame 4 index.jpg

which reads all the tiff files in first, then resizes them. You can instead do...

montage '*.tiff[100x100]' -geometry 100x100+5+5 -frame 4 index.jpg

This will read each image in, and resize them, before proceeding to the next image. Resulting in far less memory usage, and possibly prevent disk swapping (thrashing), when memory limits are reached.

Hence, this is what I am doing:

$ convert '*.ppm[1280x1280]' pngs/%05d.png

According to the docs, it should treat each image file one by one: read, resize, write. I am doing this on a machine with 12 real cores and 24 GB of RAM. However, during the first two minutes, the memory usage of the convert process grows to about 96 %. It stays there a while. CPU usage is at maximum. A bit longer and the process dies, just saying:

Killed

At this point, no output files have been produced. I am on Ubuntu 10.04 and convert --version says:

Version: ImageMagick 6.5.7-8 2012-08-17 Q16 http://www.imagemagick.org
Copyright: Copyright (C) 1999-2009 ImageMagick Studio LLC
Features: OpenMP 

It looks like convert tries to read all data before starting the conversion. So either there is a bug in convert, an issue with the documentation or I did not read the documentation properly.

What is wrong? How can I achieve low memory usage while resizing this large number of image files?

BTW: a quick solution would be to just loop over the files using the shell and invoke convert for each file independently. But I'd like to understand how to achieve the same with pure ImageMagick.

Thanks!

like image 633
Dr. Jan-Philip Gehrcke Avatar asked Sep 10 '12 13:09

Dr. Jan-Philip Gehrcke


3 Answers

Without having direct access to your system it's really hard to help you debugging this.

But you can do three things to help yourself narrowing down this problem:

  1. Add -monitor as the first commandline argument to see more details about what's going on.

  2. (Optionally) add -debug all -log "domain: %d +++ event: %e +++ function: %f +++ line: %l +++ module: %m +++ processID: %p +++ realCPUtime: %r +++ wallclocktime: %t +++ userCPUtime: %u \n\r"

  3. Temporarily, don't use '*.ppm[1280x1280]' as an argument, but use 'a*.ppm[1280x1280]' instead. The purpose is to limit your wildcard expansion (or some other suitable way to achieve the same) to only a few matches, instead of all possible matches.

If you do '2.' you'll need to do '3.' as well otherwise you'll be overwhelmed by the mass of output. (Also your system does seem to not be able to process the full wildcard anyway without having to kill the process...)

If you do not find a solution, then...

  1. ...register a username at the official ImageMagick bug report forum.
  2. ...report your problem there to see if they can help you (these guys are rather friendly and responsive if you ask politely).
like image 76
Kurt Pfeifle Avatar answered Sep 18 '22 09:09

Kurt Pfeifle


Got the same issue, it seems it's because ImageMagick create temporary files into the /tmp directory, which is often mounted as a tmpfs.

Just move your tmp somewhere else.

For example:

  • create a "tmp" directory on a big external drive

    mkdir -m777 /media/huge_device/tmp

  • make sure the permissions are set to 777

    chmod 777 /media/huge_device/tmp

  • as root, mount it in replacement to your /tmp

    mount -o bind /media/huge_device/tmp /tmp

Note: It should be possible to use with the TMP environment variable to do the same trick.

like image 45
Cecile Avatar answered Sep 20 '22 09:09

Cecile


I would go with GNU Parallel if you have 12 cores - something like this, which works very well. As it does only 12 images at a time, whilst still preserving your output file numbering, it only uses minimal RAM.

scene=0
for f in *.ppm; do
   echo "$f" $scene
   ((scene++))
done | parallel -j 12 --colsep ' ' --eta convert {1}[1280x1280] -scene {2} pngs/%05d.png

Notes

-scene lets you set the scene counter, which comes out in your %05d part.

--eta predicts when your job will be done (Estimated Arrival Time).

-j 12 runs 12 jobs in parallel at a time.

like image 30
Mark Setchell Avatar answered Sep 18 '22 09:09

Mark Setchell