So I am currently trying to render complex models at a decent speed, and having some trouble; rendering a single model causes my framerate to become strained, without any added work in the program. My model (of which there is only one in the scene) appears to be too large. There are 444384 floats in the vertex array I upload to the buffer (so 24688 triangles in the model).
//Create vertex buffers
glGenBuffers(1, &m_Buffer);
glBindBuffer(GL_ARRAY_BUFFER, m_Buffer);
int SizeInBytes = m_ArraySize * 6 * sizeof(float);
glBufferData(GL_ARRAY_BUFFER, SizeInBytes, NULL, GL_DYNAMIC_DRAW);
//Upload buffer data
glBufferSubData(GL_ARRAY_BUFFER, 0, sizeof(float) * VertArray.size(), &VertArray[0]);
I know the size of the VBO is what makes the difference because A) reducing the size improves performance, and B) commenting out the rendering code:
glPushMatrix();
//Translate
glTranslatef(m_Position.x, m_Position.y, m_Position.z);
glMultMatrixf(m_RotationMatrix);
//Bind buffers for vertex and index arrays
glBindBuffer(GL_ARRAY_BUFFER, m_Buffer);
glEnableClientState(GL_VERTEX_ARRAY);
glVertexPointer(3, GL_FLOAT, 6 * sizeof(float), 0);
glEnableClientState(GL_NORMAL_ARRAY);
glNormalPointer(GL_FLOAT, 6 * sizeof(float), (void*)12);
//Draw
glDrawArrays(GL_TRIANGLES, 0, m_ArraySize);
glDisableClientState(GL_VERTEX_ARRAY);
glDisableClientState(GL_NORMAL_ARRAY);
//Unbind the buffers
glBindBuffer(GL_ARRAY_BUFFER, 0);
glPopMatrix();
leaves me with around 2000-2500 FPS, whereas uncommenting this code sinks me down to around 130FPS, or 8ms/frame (which alone is more than enough, but I need to be able to do other things in the program as well, some of which might be CPU-intensive). A more complex model with 85k triangles brings that down to under 50 FPS, or around 20ms/frame, at which point the program visibly stutters.
The one pair of shaders I use is very minimal at this point, I doubt that's the issue. Here they are, just in case; first the vertex shader:
void main()
{
vec3 normal, lightDir;
vec4 diffuse;
float NdotL;
/* first transform the normal into eye space and normalize the result */
normal = normalize(gl_NormalMatrix * gl_Normal);
/* now normalize the light's direction. Note that according to the
OpenGL specification, the light is stored in eye space. Also since
we're talking about a directional light, the position field is actually
direction */
lightDir = normalize(vec3(gl_LightSource[0].position));
/* compute the cos of the angle between the normal and lights direction.
The light is directional so the direction is constant for every vertex.
Since these two are normalized the cosine is the dot product. We also
need to clamp the result to the [0,1] range. */
NdotL = max(dot(normal, lightDir), 0.0);
/* Compute the diffuse term */
diffuse = gl_FrontMaterial.diffuse * gl_LightSource[0].diffuse;
gl_FrontColor = NdotL * diffuse;
gl_Position = ftransform();
}
And the fragment shader:
void main()
{
gl_FragColor = gl_Color;
}
I am running the program using a GTX 660M as my graphics card.
Now as far as I know, VBOs are the fastest way to render large amounts of polygons in OpenGL, and the Internet seems to suggest that many machines can calculate and display millions of polygons at once, so I'm sure there must be a way to optimize the rendering of my comparatively measly 27k triangles. I'd rather do that now than have to rewrite and restructure larger amounts of code in the future.
I have enabled backface culling; I am not sure fustrum culling would help because at times, all or most of the model is onscreen (I currently cull objects, but not triangles within individual objects). Culling the faces in the viewport that are not facing the camera might help a bit, but I'm not sure how to do that. Beyond that, I'm not sure what to do to optimize the rendering. I haven't implemented a vertex buffer yet, but I've read that that might only increase the speed around 10%.
How do people achieve tens or hundreds of thousands of triangles on-screen at once at acceptable framerates with other stuff going on? What can I do to improve the performance of my VBO rendering?
UPDATE: As per comments below, I drew only half of the array as follows:
glDrawArrays(GL_TRIANGLES, 0, m_ArraySize/2);
And then a quarter of the array:
glDrawArrays(GL_TRIANGLES, 0, m_ArraySize/4);
Reducing the amount of array drawn each time literally doubled the speed (from 12 ms to 6 and 3 ms, respectively), yet the model was entirely intact - nothing was missing. This seems to suggest that I am doing something wrong somewhere else, but I don't know what; I'm fairly confident I'm not adding the same triangles 4+ times when I compose the model, so what else could it be? Might I perhaps somehow be uploading the buffer multiple times?
glDrawArrays()
Takes as its third argument the number of indices to draw. You are passing in the number of floats in your interleaved vertex and normal array, which is 6 times the number of indices. The GPU is lagging because you're telling it to access data outside the bounds of your buffer -- modern GPUs can trigger a fault when this happens, older ones would just crash your system :)
Consider the following interleaved array:
vx0 vy0 vz0 nx0 ny0 nz0 vx1 vy1 vz1 nx1 ny1 nz1 vx2 vy2 vz2 nx2 ny2 nz2
This array contains three vertices and three normals (a single triangle.) Drawing a triangle requires three vertices, thus you need three indices to select them. To draw the above triangle, you would use:
glDrawArrays(GL_TRIANGLES, 0, 3);
The way attributes work (vertices, normals, colors, textures, etc.), a single index selects a value from EACH of the attributes. If you added color attributes to the triangle above, you would still be using only 3 indices.
I think the problem is that each triangle in your model, has its own three vertices. You're not using indexed triangles (GL_ELEMENT_ARRAY_BUFFER, glDrawElements) so that it's possible for vertex data to be shared between triangles.
From what I can tell, there are two issues with your current approach.
The sheer amount of data that needs to be processed (although this can be a problem with indexed triangles as well).
When using glDrawArrays() as opposed to glDrawElements, the GPU cannot make use of the post-transform cache, which is used to reduce the amount of vertex processing.
If possible, re-arrange your data to use indexed triangles.
I'll just add the caveat that if you use indexed triangles, you have to make sure that you're sharing vertex data between triangles as much as possible to get the best performance. It's really about how well you organise your data.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With