I've been having some issues with transfering a GPU buffer into CPU for performing sorting operations. The buffer is a GL_SHADER_STORAGE_BUFFER
composed of 300.000 float values. The transfer operation with glGetBufferSubData
is taking around 10ms, and with glMapBufferRange
, it takes more than 100 ms.
The code Im using is the following:
std::vector<GLfloat> viewRow;
unsigned int viewRowBuffer = -1;
int length = -1;
void bindRowBuffer(unsigned int buffer){
glBindBuffer(GL_SHADER_STORAGE_BUFFER, buffer);
glBindBufferBase(GL_SHADER_STORAGE_BUFFER, 3, buffer);
}
void initRowBuffer(unsigned int &buffer, std::vector<GLfloat> &row, int lengthIn){
// Generate and initialize buffer
length = lengthIn;
row.resize(length);
memset(&row[0], 0, length*sizeof(float));
glGenBuffers(1, &buffer);
bindRowBuffer(buffer);
glBufferStorage(GL_SHADER_STORAGE_BUFFER, row.size() * sizeof(float), &row[0], GL_DYNAMIC_STORAGE_BIT | GL_MAP_READ_BIT | GL_MAP_WRITE_BIT);
glBindBuffer(GL_SHADER_STORAGE_BUFFER, 0);
}
void cleanRowBuffer(unsigned int buffer) {
float zero = 0.0;
glClearNamedBufferData(buffer, GL_R32F, GL_RED, GL_FLOAT, &zero);
}
void readGPUbuffer(unsigned int buffer, std::vector<GLfloat> &row) {
glGetBufferSubData(GL_SHADER_STORAGE_BUFFER,0,length *sizeof(float),&row[0]);
glBindBuffer(GL_SHADER_STORAGE_BUFFER, 0);
}
void readGPUMapBuffer(unsigned int buffer, std::vector<GLfloat> &row) {
float* data = (float*)glMapBufferRange(GL_SHADER_STORAGE_BUFFER, 0, length*sizeof(float), GL_MAP_READ_BIT); glBindBuffer(GL_SHADER_STORAGE_BUFFER, 0);
memcpy(&row[0], data, length *sizeof(float));
glUnmapBuffer(GL_SHADER_STORAGE_BUFFER);
}
The main is doing:
bindRowBuffer(viewRowBuffer);
cleanRowBuffer(viewRowBuffer);
countPixs.bind();
glActiveTexture(GL_TEXTURE0);
glBindTexture(GL_TEXTURE_2D, gPatch);
countPixs.setInt("gPatch", 0);
countPixs.run(SCR_WIDTH/8, SCR_HEIGHT/8, 1);
countPixs.unbind();
readGPUbuffer(viewRowBuffer, viewRow);
Where countPixs is a compute shader, but I'm possitive the problem is not there because if I comment the run command, the read takes exactly the same amount of time.
The weird thing is that if I execute a getbuffer of only 1 float:
glGetBufferSubData(GL_SHADER_STORAGE_BUFFER,0, 1 *sizeof(float),&row[0]);
It takes exactly the same time... so I'm guessing there is something wrong all-the-way... maybe related to the GL_SHADER_STORAGE_BUFFER
?
This is likely to be an GPU-CPU synchronization/round trip caused delay. I.e. once you map your buffer the previous GL command(s) which touched the buffer needs to complete immediately causing pipeline stall. Note that drivers are lazy: it is very probable GL commands have not even started executing yet.
If you can: glBufferStorage(..., GL_MAP_PERSISTENT_BIT)
and map the buffer persistently. This avoids completely re-mapping and allocation of any GPU memory and you can keep the mapped pointer over draw calls with some caveats:
After reading GL 4.5 docs bit more I found out that glFenceSync
is mandatory in order to guarantee the data has arrived from the GPU, even with GL_MAP_COHERENT_BIT:
If GL_MAP_COHERENT_BIT is set and the server does a write, the app must call glFenceSync with GL_SYNC_GPU_COMMANDS_COMPLETE (or glFinish). Then the CPU will see the writes after the sync is complete.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With