Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

OpenCL distribution

I'm currently developing an OpenCL-application for a very heterogeneous set of computers (using JavaCL to be specific). In order to maximize performance I want to use a GPU if it's available otherwise I want to fall back to the CPU and use SIMD-instructions. My plan is to implement the OpenCL-code using vector-types because my understanding is that this allows CPUs to vectorize the instructions and use SIMD-instructions.

My question however is regarding which OpenCL-implementation to use. E.g. if the computer has a Nvidia GPU I assume it's best to use Nvidia's library but if no GPU is available I want to use Intel's library to use the SIMD-instructions.

How do I achieve this? Is this handled automatically or do I have to include all libraries and implement some logic to pick the right one? It feels like this is a problem that more people than I are facing.

Update After testing the different OpenCL-drivers this is my experience so far:

  • Intel: crashed the JVM when JavaCL tried to call it. After a restart it didn't crash the JVM but it also didn't return any usable devices (I was using an Intel I7-CPU). When I compiled the OpenCL-code offline it seemed to be able to do some auto-vectorization so Intel's compiler seems quite nice.

  • Nvidia: Refused to install their WHQL-drivers because it claimed I didn't have Nvidia-card (that computer has a Geforce GT 330M). When I tried it on a different computer I managed to get all the way to create a kernel but at the first execution it crashed the drivers (the screen flickered for a while and Windows 7 said it had to restart the drivers). The second execution caused a bluee-screen of death.

  • AMD/ATI: Refused to install 32-bit SDK (I tried that since I will be using a 32-bit JVM) but 64-bit SDK worked well. This is the only driver which I've managed to execute the code on (after a restart because at first it gave a cryptic error-message when compiling). However it doesn't seem to be able to do any implicit vectorization and since I don't have any ATI GPU I didn't get any performance increase compared to the Java-implementation. If I use vector-types I might see some improvements though.

TL;DR None of the drivers seem ready for commercial use. I'm probably better of creating JNI-module with C-code compiled to use SSE-instructions.

like image 667
Yrlec Avatar asked Oct 13 '11 07:10

Yrlec


People also ask

Does Intel graphics support OpenCL?

The Intel(R) Graphics Compute Runtime for oneAPI Level Zero and OpenCL(TM) Driver is an open source project providing compute API support (Level Zero, OpenCL) for Intel graphics hardware architectures (HD Graphics, Xe).

Is AMD OpenCL?

Created as part of AMD's GPUOpen, ROCm (Radeon Open Compute) is an open source Linux project built on OpenCL 1.2 with language support for 2.0. The system is compatible with all modern AMD CPUs and APUs (actual partly GFX 7, GFX 8 and 9), as well as Intel Gen7. 5+ CPUs (only with PCI 3.0).

Does Intel HD 3000 support OpenCL?

Intel HD Graphics 3000 supports OpenCL 1.1. It contains 12 execution units. Compare this with discrete graphics cards which, at the high end, can have hundreds of execution units.


2 Answers

First try to understand hosts & devices: http://www.streamcomputing.eu/blog/2011-07-14/basic-concept-hosts-and-devices/

Basically you can just do exactly what you described: check if a certain driver is available and if not, try the next one. What you choose first depends completely on your own preference. I would pick the device I have tested my kernel best on. In JavaCL you can pick the fastest device with JavaCL.createBestContext and CLPlatform.getBestDevice, check the host-code here: http://ochafik.com/blog/?p=501

Know NVidia does not support CPUs via their driver; only AMD and Intel do. Also is targeting multiple devices (say 2 GPUs and a CPU) a bit more difficult.

like image 123
Vincent.StreamComputing Avatar answered Sep 21 '22 12:09

Vincent.StreamComputing


There is no API providing what you want. however, you can do the following:

i suggest you iterate over clGetPlatformIDs and query for the number of devices (clGetDeviceIDs), and device type for each device; and pick the platform which has both types. then build a map in u'r code, that maps for each type the list of platforms supporting it, ordered in some manner. finally, just get the first item in the list corresponding for CL_DEVICE_TYPE_CPU and the first item corresponding for CL_DEVICE_TYPE_GPU. if both returned results are equal (platform_cpu == platform_gpu) then pick one of them and use it for both.

if there is a platform supporting both, you will get match as before since you got order lists. then u can also do load balancing if u like on a single platform, like what Intel has.

like image 38
sramij Avatar answered Sep 22 '22 12:09

sramij