Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

CUDA or FPGA for special purpose 3D graphics computations? [closed]

People also ask

Why use FPGA?

A FPGA can be used if the design requires complex logic and requires high processing ability and if the cost is comparable to the performance achieved. In case of a design that requires limited hardware, and is set to perform only some specific functions, then Microcontroller is preferred.

What is special about CUDA?

CUDA improves the performance of computing tasks which benefit from parallel processing. These workloads, such as rendering 3D images in real-time, are often called "embarrassingly parallel" because they naturally lend themselves to being computed by individual cores.

What programs use CUDA?

Then came a whole slew of media applications for CUDA: Adobe Creative Suite 4, TMPGEnc 4.0 XPress, CyberLink PowerDirector 7, MotionDSP vReveal, Loilo LoiLoScope, Nero Move it, and more.


We did some comparison between FPGA and CUDA. One thing where CUDA shines if you can realy formulate your problem in a SIMD fashion AND can access the memory coalesced. If the memory accesses are not coalesced(1) or if you have different control flow in different threads the GPU can lose drastically its performance and the FPGA can outperform it. Another thing is when your operation is realtive small, but you have a huge amount of it. But you cant (e.g. due to synchronisation) no start it in a loop in one kernel, then your invocation times for the GPU kernel exceeds the computation time.

Also the power of the FPGA could be better (depends on your application scenarion, ie. the GPU is only cheaper (in terms of Watts/Flop) when its computing all the time).

Offcourse the FPGA has also some drawbacks: IO can be one (we had here an application were we needed 70 GB/s, no problem for GPU, but to get this amount of data into a FPGA you need for conventional design more pins than available). Another drawback is the time and money. A FPGA is much more expensive than the best GPU and the development times are very high.

(1) Simultanously accesses from different thread to memory have to be to sequential addresses. This is sometimes really hard to achieve.


I investigated the same question a while back. After chatting to people who have worked on FPGAs, this is what I get:

  • FPGAs are great for realtime systems, where even 1ms of delay might be too long. This does not apply in your case;
  • FPGAs can be very fast, espeically for well-defined digital signal processing usages (e.g. radar data) but the good ones are much more expensive and specialised than even professional GPGPUs;
  • FPGAs are quite cumbersome to programme. Since there is a hardware configuration component to compiling, it could take hours. It seems to be more suited to electronic engineers (who are generally the ones who work on FPGAs) than software developers.

If you can make CUDA work for you, it's probably the best option at the moment. It will certainly be more flexible than a FPGA.

Other options include Brook from ATI, but until something big happens, it is simply not as well adopted as CUDA. After that, there's still all the traditional HPC options (clusters of x86/PowerPC/Cell), but they are all quite expensive.

Hope that helps.


I would go with CUDA.
I work in image processing and have been trying hardware add-ons for years. First we had i860, then Transputer, then DSP, then the FPGA and direct-compiliation-to-hardware.
What innevitably happened was that by the time the hardware boards were really debugged and reliable and the code had been ported to them - regular CPUs had advanced to beat them, or the hosting machine architecture changed and we couldn't use the old boards, or the makers of the board went bust.

By sticking to something like CUDA you aren't tied to one small specialist maker of FPGA boards. The performence of GPUs is improving faster then CPUs and is funded by the gamers. It's a mainstream technology and so will probably merge with multi-core CPUs in the future and so protect your investment.


FPGAs

  • What you need:
    • Learn VHDL/Verilog (and trust me you don't want to)
    • Buy hw for testing, licences for synthesis tools
    • If you already have infrastructure and you need to develop only your core
      • Develop design ( and it can take years )
    • If you don't:
      • DMA, hw driver, ultra expensive synthesis tools
      • tons of knowledge about buses, memory mapping, hw synthesis
      • build the hw, buy the ip cores
      • Develop design
      • Not mentioning of board developement
  • For example average FPGA pcie card with chip Xilinx ZynqUS+ costs more than 3000$
  • FPGA cloud is also costly 2$/h+
  • Result:
    • This is something which requires resources of running company at least.

GPGPU (CUDA/OpenCL)

  • You already have hw to test on.
  • Compare to FPGA stuff:
    • Everything is well documented .
    • Everything is cheap
    • Everything works
    • Everything is well integrated to programming languages
  • There is GPU cloud as well.
  • Result:
    • You need to just download sdk and you can start.

This is an old thread started in 2008, but it would be good to recount what happened to FPGA programming since then: 1. C to gates in FPGA is the mainstream development for many companies with HUGE time saving vs. Verilog/SystemVerilog HDL. In C to gates System level design is the hard part. 2. OpenCL on FPGA is there for 4+ years including floating point and "cloud" deployment by Microsoft (Asure) and Amazon F1 (Ryft API). With OpenCL system design is relatively easy because of very well defined memory model and API between host and compute devices.

Software folks just need to learn a bit about FPGA architecture to be able to do things that are NOT EVEN POSSIBLE with GPUs and CPUs for the reasons of both being fixed silicon and not having broadband (100Gb+) interfaces to the outside world. Scaling down chip geometry is no longer possible, nor extracting more heat from the single chip package without melting it, so this looks like the end of the road for single package chips. My thesis here is that the future belongs to parallel programming of multi-chip systems, and FPGAs have a great chance to be ahead of the game. Check out http://isfpga.org/ if you have concerns about performance, etc.


FPGA-based solution is likely to be way more expensive than CUDA.


Obviously this is a complex question. The question might also include the cell processor. And there is probably not a single answer which is correct for other related questions.

In my experience, any implementation done in abstract fashion, i.e. compiled high level language vs. machine level implementation, will inevitably have a performance cost, esp in a complex algorithm implementation. This is true of both FPGA's and processors of any type. An FPGA designed specifically to implement a complex algorithm will perform better than an FPGA whose processing elements are generic, allowing it a degree of programmability from input control registers, data i/o etc.

Another general example where an FPGA can be much higher performance is in cascaded processes where on process outputs become the inputs to another and they cannot be done concurrently. Cascading processes in an FPGA is simple, and can dramatically lower memory I/O requirements while processor memory will be used to effectively cascade two or more processes where there are data dependencies.

The same can be said of a GPU and CPU. Algorithms implemented in C executing on a CPU developed without regard to the inherent performance characteristics of the cache memory or main memory system will not perform as well as one implemented which does. Granted, not considering these performance characteristics simplifies implementation. But at a performance cost.

Having no direct experience with a GPU, but knowing its inherent memory system performance issues, it too will be subjected to performance issues.