Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why program (global) scope variables must be __constant?

I am new to OpenCL and really confused by this restriction. For example, if I want to write a LCG, I have to make the state word be modifiable to both rand() and srand(). In ANSI C, I will do that with something like:

/* ANSI C */
static unsigned long _holdrand = 1; /* Global! */

unsigned long rand(){
    _holdrand = _holdrand * 214013L + 2531011L;
    return (_holdrand >> 16) & 0x7FFF; 
}
void srand( unsigned long seed ){
    _holdrand = seed;
}

But OpenCL restrict all global scope variables being __constant. I could move _holdrand into function scope, and return it's pointer out of that function.

/* OpenCL C */
uint* holdrand(){
    __private static uint _holdrand = 1;
    return &_holdrand;
}

uint rand(){
    *holdrand() = *holdrand() * 214013L + 2531011L;
    return (*holdrand() >> 16) & 0x7FFF; 
}
void srand( uint seed ){
    *holdrand() = seed;
}

It works fine and I don't know if this is a good solution. The restriction made nonsense, I just avoided it by adding more weird code.

__private uint _holdrand = 1;
/* It should be the same thing... Why this is not allowed? */

Since the return-a-pointer-of-static manner will behave exactly the same as the global scope variable approach in ANSI C, I couldn't understand what the restriction meaning for. Could someone explain why? Did I missed something? What should I do to make _holdrand modifiable in two different functions in this example?

like image 797
Aean Avatar asked Mar 18 '14 05:03

Aean


1 Answers

Briefly - OpenCL program lifetime & memory layout is different from C program. In OpenCL, you don't have stack, heap, etc. Constant memory is (usually) very fast & little amount of on-chip memory, IO operations to which has the same order of performance in comparison to register operations. So, it may have limitations for write operations from Work Items.

In every NDRange (usually) there are thousands of Work Items (WI). Imagine what performance you can achieve if, say, 512 threads are reading/writing same variable. That's why you have 4 address spaces:

  • __private for every WI
  • __local for all WIs inside Work Group
  • __global for all WI within NDRange
  • __constant for global read-only variables

If your rand() & srand() functions are WI-specific, you should use private memory. An alternative way is to have variables you need in global address space. But in this case be very careful with race conditions.

OpenCL can be run on the vast variety of devices, that's why some restrictions look too strong.

like image 51
Roman Arzumanyan Avatar answered Oct 06 '22 01:10

Roman Arzumanyan