I've noticed that often, global
and constant
device memory is initialized to 0. Is this a universal rule? I wasn't able to find anything in the standard.
No it doesn't. For instance I had this small kernel to test atomic add:
kernel void atomicAdd(volatile global int *result){
atomic_add(&result[0], 1);
}
Calling it with this host code (pyopencl + unittest):
def test_atomic_add(self):
NDRange = (4, 4)
result = np.zeros(1, dtype=np.int32)
out_buf = cl.Buffer(self.ctx, self.mf.WRITE_ONLY, size=result.nbytes)
self.prog.atomicAdd(self.queue, NDRange, NDRange, out_buf)
cl.enqueue_copy(self.queue, result, out_buf).wait()
self.assertEqual(result, 16)
was always returning the correct value when using my CPU. However on a ATI HD 5450 the returned value was always junk.
And If I well recall, on an NVIDIA the first run was returning the correct value, i.e. 16, but for the following run, the values were 32, 48, etc. It was reusing the same location with the old value still stored there.
When I corrected my host code with this line (copying the 0 value to the buffer):
out_buf = cl.Buffer(self.ctx, self.mf.WRITE_ONLY | self.mf.COPY_HOST_PTR, hostbuf=result)
Everything worked fine on any devices.
As far as I know there is no sentence in standard that states this. Maybe some driver implementations will do this automatically, but you shoudn't rely on it.
I remember that once I had a case where a buffer was not initialized to 0, but I can't remember the settings of "OS + driver".
Probably what is going on is that the typical OS does not use even 1% of now a days devices memory. So when you start a OpenCL, there is a huge probability that you will fall into an empty zone.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With