I'm working on an a GPU accelerated program that requires the reading of an entire file of variable size. My question, what is the optimal number of bytes to read from a file and transfer to a coprocessor (CUDA device)?
These files could be as large as 2GiB, so creating a buffer of that size doesn't seem like the best idea.
You can cudaMalloc a buffer of the maximum size you can on your device. After this, copy over chunks of your input data of this size from host to device, process it, copy back the results and continue.
// Your input data on host
int hostBufNum = 5600000;
int* hostBuf = ...;
// Assume this is largest device buffer you can allocate
int devBufNum = 1000000;
int* devBuf;
cudaMalloc( &devBuf, sizeof( int ) * devBufNum );
int* hostChunk = hostBuf;
int hostLeft = hostBufNum;
int chunkNum = ( hostLeft < devBufNum ) ? hostLeft : devBufNum;
do
{
cudaMemcpy( devBuf, hostChunk, chunkNum * sizeof( int ) , cudaMemcpyHostToDevice);
doSomethingKernel<<< >>>( devBuf, chunkNum );
hostChunk = hostChunk + chunkNum;
hostLeft = hostBufNum - ( hostChunk - hostBuf );
} while( hostLeft > 0 );
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With