Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Maximum memory allocation size in OpenCL only a quarter of available main memory--why?

Tags:

opencl

For the device info parameter CL_DEVICE_MAX_MEM_ALLOC_SIZE, the OpenCL standard (2.0, similar in earlier versions) has this to say:

Max size of memory object allocation in bytes. The minimum value is max (min(1024*1024*1024, 1/4th of CL_DEVICE_GLOBAL_MEM_SIZE), 128*1024*1024) for devices that are not of type CL_DEVICE_TYPE_CUSTOM.

It turns out that both the AMD and Intel CPU OpenCL implementations only offer up a quarter of the available memory (about 2 GiB on my machine with 8 GiB, and similarly on other machines) to allocate at one time. I don't see a good technical justification for this. I'm aware that AMD GPUs have similar restrictions, controlled by the GPU_MAX_ALLOC_PERCENT environment variable, but even there, I don't quite see where the difficulty is with just offering up all memory for allocation.

To sum up: What is the technical reason for restricting the amount of memory being allocated at one time? After all, I can malloc() all my memory on the CPU in one big gulp. Is there perhaps some performance concern I'm not understanding?

like image 528
Andreas Klöckner Avatar asked Dec 05 '13 19:12

Andreas Klöckner


1 Answers

AMD GPUs use a segmented memory model in hardware with a limit on the size of each segment imposed by the size of the hardware registers used to access the memory. However, OpenCL requires a non-segmented global memory model to be presented by the OpenCL implementation. Therefore to pass conformance in all cases, AMD must restrict global memory to lie within the same hardware memory segment, i.e. present a reduced CL_DEVICE_MAX_MEM_ALLOC_SIZE.

If you increase the amount of GPU memory available to the CL runtime, AMDs compiler will try to split memory buffers into different hardware memory segments to make things work, e.g. with 512Mb total you may be able to correctly use two 256Mb buffers but not a single 512Mb buffer.

I believe in more recent hardware the segment size increases.

On the CPU side: are you running a 32 bit program or 64 bit? Based on your last comment about malloc() I'm assuming 64 bit so it's not the usual 32 bit things. However, AMD and Intel may internally use 32 bit variables for memory and unable or unwilling to migrate their code to be fully 64 bit. That's pure speculation, though.

like image 200
user2746401 Avatar answered Oct 18 '22 09:10

user2746401