Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

glBufferSubData performances abysmal on iOS?

I can't quite grasp why this code is slow for the GPU on iOS, this code works great on Windows without any problems.

Basically what I'm doing is that I have one big dynamic vertex buffer (GL_STREAM_DRAW) and I try to update only portions of it, portions that in a single frame shouldn't overlap so they shouldn't cause flushes and the CPU shouldn't have to wait for the GPU to finish but it's clearly not the case I get approximatly 10 fps on an iPhone 4 even when drawing as little as maybe 10 to 20 triangles... whereas I get more than 400 FPS on my PC with the same code...

As you can see in the trace, I'm reusing the same buffer, but I'm making sure the updated portions don't overlap... what could I do to improve performances?

Index   Trace
695 glBindBuffer(GL_ARRAY_BUFFER, 1u)
696 glBufferSubData(GL_ARRAY_BUFFER, 144l, 144l, 0x0453d090)
697 glBlendFunc(GL_SRC_ALPHA, GL_ZERO)
698 glBlendFunc(GL_SRC_ALPHA, GL_ONE_MINUS_SRC_ALPHA)
699 glActiveTexture(GL_TEXTURE0)
700 glBindTexture(GL_TEXTURE_2D, 12u)
701 glUseProgram(12ul)
702 glUniform4fv(uniform_000000001cd24950_12_0, 1, {0.0500000f, 0.0000000f, 0.0000000f, 0.0000000f})
703 glUniform4fv(uniform_000000001cd24950_12_1, 1, {0.0000000f, 0.0333333f, 0.0000000f, 0.0000000f})
704 glUniform4fv(uniform_000000001cd24950_12_2, 1, {0.0000000f, 0.0000000f, -0.0010010f, 0.0000000f})
705 glUniform4fv(uniform_000000001cd24950_12_3, 1, {-0.0000000f, 0.6333333f, -0.0010010f, 1.0000000f})
706 glDrawArrays(GL_TRIANGLES, 6, 6)
707 glBindBuffer(GL_ARRAY_BUFFER, 1u)
708 glBufferSubData(GL_ARRAY_BUFFER, 288l, 144l, 0x0453d120)
like image 225
Francois Hamel Avatar asked Jun 12 '11 20:06

Francois Hamel


1 Answers

I guess the iOS driver is just not smart enough, to see that the updated ranges (in glBufferSubData) don't overlap with the currently processed ones. I'm not even sure if your PC driver is smart enough for that (could be your PC's overall performance hides that). It is up to the driver how it synchronizes and if it optimizes that.

One solution to handle this could be the ARB_map_buffer_range extension, which can give explicit hints to the driver. But I'm not sure if this is supported in ES. Otherwise you won't get around splitting your buffer into multiple small ones.

like image 114
Christian Rau Avatar answered Oct 12 '22 09:10

Christian Rau