Graphics is one of those "embarrassingly parallel" problems. Haskell is supposed to be really, really good for parallel processing. So my question is:
What is the best way to throw as many CPU cores as possible at a rendering problem?
Is it possible to get the GPU to do the task instead?
By "rendering problem", I mean problems such as:
Each pixel's colour is a pure function of its coordinates.
We start with an existing "input" image, and each "output" pixel's colour is a pure function of the corresponding input pixel, or maybe a small neighbourhood of such pixels.
Regarding #1: This looks like it's trivial, but actually it isn't. There are several possible choices of data structure to store the computed pixels in (which influences how you can access it, and how easily you can dump the result onto disk or screen). There are several ways to execute on multiple cores. And so on.
It seems to me that Data Parallel Haskell would be an ideal choice for this type of thing. However, last time I checked, DPH doesn't work yet. So that's that. Even assuming it did work, you would presumably create a parallel array to hold the pixels, and then you'd have to copy the pixels to display them on screen or write them to disk.
I would try sparking every single pixel, but that's probably far too fine-grained. I could make the pixels a list and use one of the parallel list strategies. Or I could make it an (unboxed?) immutable array and write some manual code to start sparks. Or I could go with explicit threads and mutable arrays. Or I could have a bunch of worker threads them stream pixel values through a channel to a master thread that puts the results into the right place. Or...
In summary, there are a surprising number of possibilities here, and I'm not sure which is best.
Regarding #2: Obviously this type of problem is the entire reason that GPUs exist in the first place. Clearly the GPU is ideally suited to attacking these kinds of problems. My question is more "is it hard to do this from Haskell?"
If you are amenable to mixing languages, then OpenCL is very versatile. Although the OpenCL language is very close to being C (so definitely not Haskell), you can write your kernel code in a more or less functional style and think of it as mapping that kernel over spatial coordinates. An advantage of doing things with a mainstream parallel programming framework like OpenCL is that you can lean on the growing volume of knowledge both HPC and graphics folks have amassed over the years across many application domains. Retargeting between the CPU and GPU is mostly painless, but you will need to be aware of considerations about data types (e.g. some GPUs don't support double precision).
I wrote a tutorial on calling into OpenCL from Haskell. It rests upon the relatively new OpenCL bindings (there are several OpenCL bindings on hackage, I can not attest to their relative quality).
There are raw OpenCL bindings, but if you want something that helps you run high-level code — folds and zips and maps and so on — on the GPU today, take a look at accelerate (CUDA backend) and GPipe (OpenGL backend, for graphics work; a bit bitrotten nowadays, unfortunately).
As far as structures to represent a rendered image go, an unboxed array is probably your best bet: it's the closest fit to the hardware, and you generally don't do pure "incremental" updates on a rendering.
The short answer to question 1, in the absence of more detail, is:
Write your code as normal, using a vector or array processing library.
If the library doesn't already do it for you, insert appropriate 'par' calls or combinators based on it to farm computations out to multiple CPUs.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With