Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Misaligned address in CUDA

Can anyone tell me whats wrong with the following code inside a CUDA kernel:

__constant__ unsigned char MT[256] = {
    0xde, 0x6f, 0x6f, 0xb1, 0xde, 0x6f, 0x6f, 0xb1, 0x91, 0xc5, 0xc5, 0x54, 0x91, 0xc5, 0xc5, 0x54,....};

typedef unsinged int U32;

__global__ void Kernel (unsigned int  *PT, unsigned int  *CT, unsigned int  *rk)
{

    long int i;
    __shared__ unsigned char sh_MT[256];    

    for (i = 0; i < 64; i += 4)
        ((U32*)sh_MT)[threadIdx.x + i] = ((U32*)MT)[threadIdx.x + i];

    __shared__ unsigned int sh_rkey[4];
    __shared__ unsigned int sh_state_pl[4];
    __shared__ unsigned int sh_state_ct[4];

    sh_state_pl[threadIdx.x] = PT[threadIdx.x];
    sh_rkey[threadIdx.x] = rk[threadIdx.x];
    __syncthreads();


    sh_state_ct[threadIdx.x] = ((U32*)sh_MT)[sh_state_pl[threadIdx.x]]^\
    ((U32*)(sh_MT+3))[((sh_state_pl[(1 + threadIdx.x) % 4] >> 8) & 0xff)] ^ \
    ((U32*)(sh_MT+2))[((sh_state_pl[(2 + threadIdx.x) % 4] >> 16) & 0xff)] ^\
    ((U32*)(sh_MT+1))[((sh_state_pl[(3 + threadIdx.x) % 4] >> 24) & 0xff )];


    CT[threadIdx.x] = sh_state_ct[threadIdx.x];
}

At This line of code ,

((U32*)(sh_MT+3))......

The CUDA debugger gives me the error message : misaligned address

How can I fix this error?

I am using CUDA 7 in MVSC and i use 1 Block and 4 threads for executing the Kernel Function as follow:

__device__ unsigned int *state;
__device__ unsigned int *key;
__device__ unsigned int *ct;
.
.
main()
{
cudaMalloc((void**)&state, 16);
cudaMalloc((void**)&ct, 16);
cudaMalloc((void**)&key, 16);
//cudamemcpy(copy some values to => state , ct, key);   
Kernel << <1, 4 >> >(state, ct, key); 
}

Remember please, I can't change my "MT Table" type. Thanks in advance for any advice or answer .

like image 347
m.r226 Avatar asked Dec 15 '22 06:12

m.r226


1 Answers

What the error message means is that the pointer is not aligned to the boundary required by the processor.

From the CUDA Programming Guide, section 5.3.2:

Global memory instructions support reading or writing words of size equal to 1, 2, 4, 8, or 16 bytes. Any access (via a variable or a pointer) to data residing in global memory compiles to a single global memory instruction if and only if the size of the data type is 1, 2, 4, 8, or 16 bytes and the data is naturally aligned (i.e., its address is a multiple of that size).

This is what the debugger is trying to tell you: Basically, you shouldn't dereference a pointer pointing to a 32-bit value from an address not aligned at a 32-bit boundary.

You can do (U32*)(sh_MT) and (U32*)(sh_MT+4) just fine, but not (U32*)(sh_MT+3) or such.

You probably have to read the bytes separately and join them together.

like image 178
CherryDT Avatar answered Dec 28 '22 17:12

CherryDT