Whats actually happens when you call cudaMalloc inside device?

Question

This here actually works, so I am wondering is cuda dynamically allocating memory on the device in the thread? If so what is the use of __device__ malloc since this is much much much faster in comparison? I am asking about what really goes on behind the scene when you use the cudaMalloc in kernel since it seems so much faster then just device malloc on heap.

#include <iostream>
#include <numeric>
#include <stdlib.h>

__global__ void testMem(int* time){
    int* a;
    cudaMalloc(&a,sizeof(int));
    a[0] = 4;
    time = a[0];
}

__global__ void testMem2(int* time){

}
int main(){
    int* h_time = (int*)malloc(sizeof(int));
    h_time[0] =0;
    int* d_time;
    cudaMalloc(&d_time,sizeof(int));
    clock_t start1 = clock();
    cudaMemcpy(d_time,h_time,sizeof(int),cudaMemcpyHostToDevice);

    testMem<<<1,1>>>(d_time);
    cudaMemcpy(h_time,d_time,sizeof(int),cudaMemcpyDeviceToHost);
    cudaDeviceSynchronize();
    clock_t end1 = clock();

    int result = end1- start1;
    //float result = (float)*h_time;
    //result =result/ CLOCKS_PER_SEC;
    std::cout<<result<<std::endl;
    std::cout<<*h_time<<std::endl;
    //std::cout<<(1<<10);
    cudaFree(d_time);
    free(h_time);

}

Florent DUGUET · Accepted Answer

Starting compute capability 3.5, you may use part of the cuda runtime api within kernels. These methods are declared as __host__ __device__ in the documentation, just like here:

__host__ __device__ cudaError_t cudaMalloc ( void** devPtr, size_t size )

Allocate memory on the device.

When doing so, remind to link against the device runtime library: cudadevrt.lib.

There is another way to allocate memory dynamically on the device: the use of malloc, which is implemented differently (documented here). It is using a small memory heap, and does not require the same compute capability.

Whats actually happens when you call cudaMalloc inside device?

Tags:

c++

cuda

gpgpu

SailingOnThoughts

1 Answers

Florent DUGUET

Recent Activity

Donate For Us

Whats actually happens when you call cudaMalloc inside device?

Tags:

c++

cuda

gpgpu

SailingOnThoughts

1 Answers

Florent DUGUET

Related questions

Recent Activity

Donate For Us