Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fastest PNG decoder for .NET

Our web server needs to process many compositions of large images together before sending the results to web clients. This process is performance critical because the server can receive several thousands of requests per hour.

Right now our solution loads PNG files (around 1MB each) from the HD and sends them to the video card so the composition is done on the GPU. We first tried loading our images using the PNG decoder exposed by the XNA API. We saw the performance was not too good.

To understand if the problem was loading from the HD or the decoding of the PNG, we modified that by loading the file in a memory stream, and then sending that memory stream to the .NET PNG decoder. The difference of performance using XNA or using System.Windows.Media.Imaging.PngBitmapDecoder class is not significant. We roughly get the same levels of performance.

Our benchmarks show the following performance results:

  • Load images from disk: 37.76ms 1%
  • Decode PNGs: 2816.97ms 77%
  • Load images on Video Hardware: 196.67ms 5%
  • Composition: 87.80ms 2%
  • Get composition result from Video Hardware: 166.21ms 5%
  • Encode to PNG: 318.13ms 9%
  • Store to disk: 3.96ms 0%
  • Clean up: 53.00ms 1%

Total: 3680.50ms 100%

From these results we see that the slowest parts are when decoding the PNG.

So we are wondering if there wouldn't be a PNG decoder we could use that would allow us to reduce the PNG decoding time. We also considered keeping the images uncompressed on the hard disk, but then each image would be 10MB in size instead of 1MB and since there are several tens of thousands of these images stored on the hard disk, it is not possible to store them all without compression.

EDIT: More useful information:

  • The benchmark simulates loading 20 PNG images and compositing them together. This will roughly correspond to the kind of requests we will get in the production environment.
  • Each image used in the composition is 1600x1600 in size.
  • The solution will involve as many as 10 load balanced servers like the one we are discussing here. So extra software development effort could be worth the savings on the hardware costs.
  • Caching the decoded source images is something we are considering, but each composition will most likely be done with completely different source images, so cache misses will be high and performance gain, low.
  • The benchmarks were done with a crappy video card, so we can expect the PNG decoding to be even more of a performance bottleneck using a decent video card.
like image 853
sboisse Avatar asked Jul 03 '12 14:07

sboisse


2 Answers

There is another option. And that is, you write your own GPU-based PNG decoder. You could use OpenCL to perform this operation fairly efficiently (and perform your composition using OpenGL which can share resources with OpenCL). It is also possible to interleave transfer and decoding for maximum throughput. If this is a route you can/want to pursue I can provide more information.

Here are some resources related to GPU-based DEFLATE (and INFLATE).

  1. Accelerating Lossless compression with GPUs
  2. gpu-block-compression using CUDA on Google code.
  3. Floating point data-compression at 75 Gb/s on a GPU - note that this doesn't use INFLATE/DEFLATE but a novel parallel compression/decompression scheme that is more GPU-friendly.

Hope this helps!

like image 147
Ani Avatar answered Sep 21 '22 13:09

Ani


Have you tried the following 2 things.

1)
Multi thread it, there is several ways of doing this but one would be a "all in" method. Basicly fully spawn X amount of threads, for the full proccess.

2)
Perhaps consider having XX thread do all the CPU work, and then feed it to the GPU thread.

Your question is very well formulated for being a new user, but some information about the senario might be usefull? Are we talking about a batch job or service pictures in real time? Do the 10k pictures change?

Hardware resources
You should also take into account what hardware resources you have at your dispoal. Normaly the 2 cheapest things are CPU power and diskspace, so if you only have 10k pictures that rarly change, then converting them all into a format that quicker to handle might be the way to go.

Multi thread trivia
Another thing to consider when doing multithreading, is that its normaly smart to make the threads in BellowNormal priority.So you dont make the entire system "lag". You have to experiment a bit with the amount of threads to use, if your luck you can get close to 100% gain in speed pr CORE but this depends alot on the hardware and the code your running.

I normaly use Environment.ProcessorCount to get the current CPU count and work from there :)

like image 35
EKS Avatar answered Sep 23 '22 13:09

EKS