I have been facing some issues using a kernel which uses some structs I have defined in c++. The error cuda-memcheck gives me is a problem with alignment.
The struct I'm trying to use contains some pointers, which I guess are giving me the problems. I have printed to console the size of the struct in the C++ side and in the CUDA side, both in the host function in the .cu file and in the kernel. This gives different results, which explains the problem I'm seeing, but I'm not sure why it happens nor how to fix it.
The struct I'm using is the following
struct Node {};
struct S
{
Node *node0;
Node *node1;
Node *node2;
double p0;
double p1;
double p2;
double p3;
Eigen::Matrix<double, 3, 2> f1;
Eigen::Matrix<double, 3, 2> f2;
}
This has a size of 160 bytes in C++, but 152 bytes in CUDA. To transfer the data I'm allocating a CUDA side buffer and doing a cudaMemcpy
std::vector<S> someVector; // Consider it exists
S *ptr;
cudaMalloc(&ptr, sizeof(S) * someVector.size());
cudaMemcpy(ptr, someVector.data(), sizeof(S)*someVector.size(), cudaMemcpyHostToDevice);
which I guess is wrong as the size in CUDA and in C++ are different.
As soon as I try to access S::node0, S::node1 or S::node3 in the kernel, I get a unaligned access error.
So I have three questions regarding this issue:
Edit:
Thanks to the accepted answer, I was able to understand the reason of the issue I was having. Eigen uses vectorizacion when possible and request 16 byte alignment for this. Vectorization is enabled when the Eigen object size is multiple of 16 bytes. In my particular case, the two Eigen::Matrix<double, 3,2> are valid for vectorization.
However, in CUDA, Eigen doesn't request 16 byte alignment.
As my struct have 4 doubles and 3 pointers, that counts for 56 bytes, which is not multiple of 16, so in CPU it must add 8 padding bytes so the Eigen matrices are 16 byte alignment. In CUDA this doesn't happen, so the sizes are differents.
The solution I have implemented is to add the 8 padding bytes by hand, so the structure is the same in CPU and in CUDA. This solves the problem and doesn't require disabling vectorization. Another solution I have found to work is to change the Eigen::Matrix<double,3,2> to 2 Eigen::Matrix<double,3,1>. Eigen::Matrix<double,3,1> doesnt meet the requirements for vectorization and therefore it doesn't need to add the 8 padding bytes in CPU.
Such difference is due to how Eigen is requesting memory alignment in C++ and CUDA.
In C++, S is being aligned to 16-bytes (you can check that alignof(S) == 16). This is due to Eigen's matrices, which are aligned to 16-bytes, maybe because of the use of SSE registers which require such alignment. The rest of your fields are aligned to 8-bytes (64-bits pointers and doubles).
In the Eigen/Core header file EIGEN_DONT_VECTORIZE directive is enabled for CUDA. When checking the documentation:
EIGEN_DONT_VECTORIZE - disables explicit vectorization when defined. Not defined by default, unless alignment is disabled by Eigen's platform test or the user defining EIGEN_DONT_ALIGN.
which basically means that Eigen matrices has no special alignment in CUDA, so they are aligned to the element type, double in your case, resulting in a 8-bytes alignment for matrices and therefore for the whole structure.
The best way to solve it is to force the alignment of the structure for both architectures. Not so fluent in CUDA right now, I think you can do it with __align__(16) in CUDA (more here), and using alignas(16) in C++ (since C++11). You can define a macro to use the correct operator if you share the declaration for both languages:
#ifdef __CUDACC__
# define MY_ALIGN(x) __align__(x)
#else
# define MY_ALIGN(x) alignas(x)
#endif
struct MY_ALIGN(16) S {
// ...
};
Anyway, be aware of such low-level copies since Eigen's implementation in CUDA may differ from the one in C++ (there is no guarantee in Eigen's documentation about it).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With