I am trying to add CUDA to an existing single threaded C program that was written sometime in the late 90s.
To do this I need to mix two languages, C and C++ (nvcc is a c++ compiler).
The problem is that the C++ compiler sees a structure as a certain size, while the C compile sees the same structure as a slightly different size. Thats bad. I am really puzzled by this because I can't find a cause for a 4 byte discrepancy.
/usr/lib/gcc/i586-suse-linux/4.3/../../../../i586-suse-linux/bin/ld: Warning: size of symbol `tree' changed from 324 in /tmp/ccvx8fpJ.o to 328 in gpu.o
My C++ looks like
#include <stdio.h>
#include <stdlib.h>
#include "assert.h"
extern "C"
{
#include "structInfo.h" //contains the structure declaration
}
...
and my C files look like
#include "structInfo.h"
...
with structInfo.h looking like
struct TB {
int nbranch, nnode, root, branches[NBRANCH][2];
double lnL;
} tree;
...
My make file looks like
PRGS = prog
CC = cc
CFLAGS=-std=gnu99 -m32
CuCC = nvcc
CuFlags =-arch=sm_20
LIBS = -lm -L/usr/local/cuda-5.0/lib -lcuda -lcudart
all : $(PRGS)
prog:
$(CC) $(CFLAGS) prog.c gpu.o $(LIBS) -o prog
gpu.o:
$(CuCC) $(CuFlags) -c gpu.cu
Some people asked me why I didn't use a different host compilation option. I think the host compilation option has been deprecated since 2 release ago? Also it never appeared to do what it said it would do.
nvcc warning : option 'host-compilation' has been deprecated and is ignored
GPUs require natural alignment for all data, e.g. a 4-byte int needs to be aligned to a 4-byte boundary and an 8-byte double or long long needs to have 8-byte alignment. CUDA enforces this for host code as well to make sure structs are as compatible as possible between the host and device portions of the code. x86 CPUs on the other hand do not generally require data to be naturally aligned (although performance penalty may result from a lack of alignment).
In this case, CUDA needs to align the double component of the struct to an 8-byte boundary. Since an odd number of int components preceed the double, this requires padding. Switching the order of components, i.e. putting the double component first, does not help because in an array of such structs each struct would have to be 8-byte aligned and the size of the struct therefore must be a multiple of 8 bytes to accomplish that, which also requires padding.
To force gcc to align doubles in the same way CUDA does, pass the flag -malign-double
.
Seems like different padding applied by 2 compilers: one is working with 4-byte alignment and the other with at least 8-byte alignment. You should be able to force the alignment you want by compiler-specific #pragma
directives (check your compiler documentation about the specific #pragma
).
There is no guarantee that two different C compilers will use the same representation for the same type -- unless they both conform to some external standard (an ABI) that specifies the representation in sufficient detail.
It's most likely a difference in padding, where one compiler requires a double
to be 4-byte aligned and the other requires it to be 8-byte aligned. Both choices are perfectly valid as far as the C and C++ standards are concerned.
You can investigate this in more detail by printing out the sizes and offsets of all the members of your structure:
printf("nbranch: size %3u offset %3u\n",
(unsigned)sizeof tree.nbranch,
(unsigned)offsetof(struct TB, nbranch));
/* and similarly for the other members */
There may be a compiler-specific way to specify a different alignment, but such techniques are not always safe.
The ideal solution would be to use the same compiler for the C and C++ code. C is not a subset of C++, but it generally shouldn't be too difficult to modify existing C code so it compiles as C++.
Or you might be able to rearrange your structure definition so that both compilers happen to lay it out the same way. Placing the double
member first is likely to work. This is still not guaranteed to work, and it could break with future versions of either compiler, but it's probably good enough.
Don't forget that there could also be padding at the very end of the structure; this is sometimes necessary to guarantee proper alignment for arrays of structures. Look at sizeof (struct TB)
and compare it to the size and offset of the last declared member.
Another possibility: Insert explicit unused members to force a consistent alignment. For example, suppose if you have:
struct foo {
uint16_t x;
uint32_t y;
};
and one compiler puts y
at 16 bits, and the other puts it at 32 bits with 16 bits of padding. If you change the definition to:
struct foo {
uint16_t x;
uint16_t unused_padding;
uint32_t y;
};
then you're more likely to have x
and y
have the same offset under both compilers. You'll still have to experiment to make sure everything is consistent.
Since the C and C++ code are going to be part of the same program (right?), you shouldn't have to worry about things like varying byte order. If you wanted to transmit values of your structure type between separate programs, say by storing them in files or transmitting them over a network, you might need to define a consistent way to serialize a structure value into a sequence of bytes and vice versa.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With