CUDA Dynamic Parallelism MakeFile

Tags:

c

cuda

This is my first program using Dynamic Parallelism and I am unable to compile the code. I need to be able to run this for my research project at college and any help will be most appreciated:

I get the following error:

/cm/shared/apps/cuda50/toolkit/5.0.35/bin/nvcc -m64 -dc  -gencode arch=compute_35,code=sm_35 -rdc=true -dlink -po maxrregcount=16 -I/cm/shared/apps/cuda50/toolkit/5.0.35 -I. -I.. -I../../common/inc -o BlackScholes.o -c BlackScholes.cu
g++ -m64 -I/cm/shared/apps/cuda50/toolkit/5.0.35 -I. -I.. -I../../common/inc -o BlackScholes_gold.o -c BlackScholes_gold.cpp
g++ -m64 -o BlackScholes BlackScholes.o BlackScholes_gold.o -L/cm/shared/apps/cuda50/toolkit/5.0.35/lib64 -lcudart -lcudadevrt
BlackScholes.o: In function `__sti____cudaRegisterAll_47_tmpxft_000059cb_00000000_6_BlackScholes_cpp1_ii_c58990ec()':
tmpxft_000059cb_00000000-3_BlackScholes.cudafe1.cpp:(.text+0x1354): undefined reference to `__cudaRegisterLinkedBinary_47_tmpxft_000059cb_00000000_6_BlackScholes_cpp1_ii_c58990ec'
collect2: ld returned 1 exit status
make: *** [BlackScholes] Error 1

I have one cpp file, one cu file and one cuh file. Important portions of my makefile are below:

# CUDA code generation flags
#GENCODE_SM10    := -gencode arch=compute_10,code=sm_10
GENCODE_SM20     := -gencode arch=compute_20,code=sm_20
GENCODE_SM30     := -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35
GENCODE_SM35     := -gencode arch=compute_35,code=sm_35
#GENCODE_FLAGS    := $(GENCODE_SM10) $(GENCODE_SM20) $(GENCODE_SM30)
GENCODE_FLAGS    := $(GENCODE_SM35)

# OS-specific build flags
ifneq ($(DARWIN),)
      LDFLAGS   := -Xlinker -rpath $(CUDA_LIB_PATH) -L$(CUDA_LIB_PATH) -lcudart -lcudadevrt
      CCFLAGS   := -arch $(OS_ARCH)
else
  ifeq ($(OS_SIZE),32)
      LDFLAGS   := -L$(CUDA_LIB_PATH) -lcudart -lcudadevrt
      CCFLAGS   := -m32
  else
      LDFLAGS   := -L$(CUDA_LIB_PATH) -lcudart -lcudadevrt
      CCFLAGS   := -m64
  endif
endif

# OS-architecture specific flags
ifeq ($(OS_SIZE),32)
      NVCCFLAGS := -m32 -dc
else
      NVCCFLAGS := -m64 -dc
endif

# Debug build flags
ifeq ($(dbg),1)
      CCFLAGS   += -g
      NVCCFLAGS += -g -G
      TARGET := debug
else
      TARGET := release
endif


# Common includes and paths for CUDA
INCLUDES      := -I$(CUDA_INC_PATH) -I. -I.. -I../../common/inc

# Additional parameters
MAXRREGCOUNT  :=  -po maxrregcount=16

# Target rules
all: build

build: BlackScholes

BlackScholes.o: BlackScholes.cu
        $(NVCC) $(NVCCFLAGS) $(EXTRA_NVCCFLAGS) $(GENCODE_FLAGS) -rdc=true -dlink $(MAXRREGCOUNT) $(INCLUDES) -o $@ -c $<

BlackScholes_gold.o: BlackScholes_gold.cpp
        $(GCC) $(CCFLAGS) $(INCLUDES) -o $@ -c $<

BlackScholes: BlackScholes.o BlackScholes_gold.o
        $(GCC) $(CCFLAGS) -o $@ $+ $(LDFLAGS) $(EXTRA_LDFLAGS)
        mkdir -p ../../bin/$(OSLOWER)/$(TARGET)
        cp $@ ../../bin/$(OSLOWER)/$(TARGET)
    enter code here

run: build
        ./BlackScholes

762

asked Feb 27 '14 17:02

Kanishk Kanoria

1 Answers

When using the host linker (g++) for final linking of your executable, and when using relocatable device code (nvcc -dc), it's necessary to do an intermediate device code link step.

From the documentation:

If you want to invoke the device and host linker separately, you can do:

nvcc –arch=sm_20 –dc a.cu b.cu
nvcc –arch=sm_20 –dlink a.o b.o –o link.o
g++ a.o b.o link.o –L<path> -lcudart

Since you are specifying -dc on the compile line, you are getting a compile-only operation (just as if you had specified -c to g++).

Here's a modified/condensed Makefile that should show you what is involved:

GENCODE_SM35     := -gencode arch=compute_35,code=sm_35
GENCODE_FLAGS    := $(GENCODE_SM35)

LDFLAGS   := -L/usr/local/cuda/lib64 -lcudart -lcudadevrt
CCFLAGS   := -m64

NVCCFLAGS := -m64 -dc

NVCC := nvcc
GCC := g++

# Debug build flags
ifeq ($(dbg),1)
      CCFLAGS   += -g
      NVCCFLAGS += -g -G
      TARGET := debug
else
      TARGET := release
endif


# Common includes and paths for CUDA
INCLUDES      := -I/usr/local/cuda/include -I. -I..

# Additional parameters
MAXRREGCOUNT  :=  -po maxrregcount=16

# Target rules
all: build

build: BlackScholes

BlackScholes.o: BlackScholes.cu
        $(NVCC) $(NVCCFLAGS) $(EXTRA_NVCCFLAGS) $(GENCODE_FLAGS) $(MAXRREGCOUNT) $(INCLUDES) -o $@ $<
        $(NVCC) -dlink  $(GENCODE_FLAGS) $(MAXRREGCOUNT)  -o bs_link.o $@

BlackScholes_gold.o: BlackScholes_gold.cpp
        $(GCC) $(CCFLAGS) $(INCLUDES) -o $@ -c $<

BlackScholes: BlackScholes.o BlackScholes_gold.o bs_link.o
        $(GCC) $(CCFLAGS) -o $@ $+ $(LDFLAGS) $(EXTRA_LDFLAGS)

run: build
        ./BlackScholes

172

answered Sep 28 '22 04:09

Robert Crovella

Related questions
                            
                                python's yield feature in C/C++? [duplicate]
                            
                                Putting a composite statement in the condition of a for loop
                            
                                bind Invalid argument
                            
                                Auto-connection signals with GtkBuilder but on GTKmm
                            
                                Hiding longjmps in C++ interface to C code
                            
                                Disable application after expiry date for trial
                            
                                lemon parser parsing 0 token
                            
                                Why do we need *.lib files? [closed]
                            
                                Optimize log entropy calculation in sparse matrix
                            
                                getting source code for linux's /bin/ss tool [closed]
                            
                                Message Truncated in MPI_Recv
                            
                                Why printf() when printing multiple strings (%s) leaves newline and how to solve this?
                            
                                Python with C libraries
                            
                                "Bad file descriptor" error when reading from pipe as stdin
                            
                                Getting The Memory Address Of A DLL Function
                            
                                Creating a new GSource in GLib
                            
                                Why do those two ways to set a variable to all 1s lead to different results?
                            
                                How generate pseudo-random numbers in uniform and gaussian distribution without float/double numbers?
                            
                                Virtual Memory on OSX/iOS versus Windows commit/reserve behaviour
                            
                                Why does returning a floating-point value change its value?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With