I'm working on a Tile-based OpenGL, C++ application. I'm adding sample screen from application, so that it will be more clear:
I have Tile
class which contains an array of Object
s. Each tile can store up to 15 objects - the example of that is Tile
with green and yellow square on it (two objects), so it's 10x10x15 = 1500 Object
s to draw (in the worst case, because I'm not handling 'empty' ones). Usually it's less, in my testings I use around 600 of them. Object
has it's own graphic, that can be drawn. Each Object
belongs to one Tile
at a time, but it can be moved (as for example red squares in the picture).
Object
s backgrounds are going to have a border and they need to be nicely scalable, so I'm using 9-patch pattern to draw them (they are made of 9 quads).
Without drawing Tile
s (their Object
s to be precise), my application has around 600 fps
.
At first, I've been using obsolete method to draw those Tile
s - using glBegin(GL_QUADS)
/glEnd()
and glDisplayList
s. I had a big drop of performance due to that drawing - from 600
to 320 fps
. This is how I've been drawing them:
bool Background::draw(const TPoint& pos, int width, int height)
{
if(width <= 0 || height <= 0)
return false;
//glFrontFace(GL_CW);
glPushMatrix();
glTranslatef((GLfloat)pos.x, (GLfloat)pos.y, 0.0f); // Move background to right direction
if((width != m_savedWidth) || (height != m_savedHeight)) // If size to draw is different than the one saved in display list,
// then recalculate everything and save in display list
{
// That size will be now saved in display list
m_savedWidth = width;
m_savedHeight = height;
// If this background doesn't have unique display list id specified yet,
// then let OpenGL generate one
if(m_displayListId == NO_DISPLAY_LIST_ID)
{
GLuint displayList;
displayList = glGenLists(1);
m_displayListId = displayList;
}
glNewList(m_displayListId, GL_COMPILE);
GLfloat texelCentersOffsetX = (GLfloat)1/(2*m_width);
// Instead of coordinates range 0..1 we need to specify new ones
GLfloat maxTexCoordWidth = m_bTiling ? (GLfloat)width/m_width : 1.0;
GLfloat maxTexCoordHeight = m_bTiling ? (GLfloat)height/m_height : 1.0;
GLfloat maxTexCoordBorderX = (GLfloat)m_borderWidth/m_width;
GLfloat maxTexCoordBorderY = (GLfloat)m_borderWidth/m_height;
/* 9-cell-pattern
-------------------
| 1 | 2 | 3 |
-------------------
| | | |
| 4 | 9 | 5 |
| | | |
-------------------
| 6 | 7 | 8 |
-------------------
*/
glBindTexture(GL_TEXTURE_2D, m_texture); // Select Our Texture
// Top left quad [1]
glBegin(GL_QUADS);
// Bottom left
glTexCoord2f(0.0, maxTexCoordBorderY);
glVertex2i(0, 0 + m_borderWidth);
// Top left
glTexCoord2f(0.0, 0.0);
glVertex2i(0, 0);
// Top right
glTexCoord2f(maxTexCoordBorderX, 0.0);
glVertex2i(0 + m_borderWidth, 0);
// Bottom right
glTexCoord2f(maxTexCoordBorderX, maxTexCoordBorderY);
glVertex2i(0 + m_borderWidth, 0 + m_borderWidth);
glEnd();
// Top middle quad [2]
glBegin(GL_QUADS);
// Bottom left
glTexCoord2f(maxTexCoordBorderX + texelCentersOffsetX, maxTexCoordBorderY);
glVertex2i(0 + m_borderWidth, 0 + m_borderWidth);
// Top left
glTexCoord2f(maxTexCoordBorderX + texelCentersOffsetX, 0.0);
glVertex2i(0 + m_borderWidth, 0);
// Top right
glTexCoord2f((GLfloat)1.0 - maxTexCoordBorderX - texelCentersOffsetX, 0.0);
glVertex2i(0 + width - m_borderWidth, 0);
// Bottom right
glTexCoord2f((GLfloat)1.0 - maxTexCoordBorderX - texelCentersOffsetX, maxTexCoordBorderY);
glVertex2i(0 + width - m_borderWidth, 0 + m_borderWidth);
glEnd();
// Top right quad [3]
glBegin(GL_QUADS);
// Bottom left
glTexCoord2f((GLfloat)1.0 - maxTexCoordBorderX, maxTexCoordBorderY);
glVertex2i(0 + width - m_borderWidth, 0 + m_borderWidth);
// Top left
glTexCoord2f((GLfloat)1.0 - maxTexCoordBorderX, 0.0);
glVertex2i(0 + width - m_borderWidth, 0);
// Top right
glTexCoord2f(1.0, 0.0);
glVertex2i(0 + width, 0);
// Bottom right
glTexCoord2f(1.0, maxTexCoordBorderY);
glVertex2i(0 + width, 0 + m_borderWidth);
glEnd();
// Middle left quad [4]
glBegin(GL_QUADS);
// Bottom left
glTexCoord2f(0.0, (GLfloat)1.0 - maxTexCoordBorderY );
glVertex2i(0, 0 + height - m_borderWidth);
// Top left
glTexCoord2f(0.0, maxTexCoordBorderY );
glVertex2i(0, 0 + m_borderWidth);
// Top right
glTexCoord2f(maxTexCoordBorderX, maxTexCoordBorderY );
glVertex2i(0 + m_borderWidth, 0 + m_borderWidth);
// Bottom right
glTexCoord2f(maxTexCoordBorderX, (GLfloat)1.0 - maxTexCoordBorderY );
glVertex2i(0 + m_borderWidth, 0 + height - m_borderWidth);
glEnd();
// Middle right quad [5]
glBegin(GL_QUADS);
// Bottom left
glTexCoord2f((GLfloat)1.0 - maxTexCoordBorderX, (GLfloat)1.0 - maxTexCoordBorderY);
glVertex2i(0 + width - m_borderWidth, 0 + height - m_borderWidth);
// Top left
glTexCoord2f((GLfloat)1.0 - maxTexCoordBorderX, maxTexCoordBorderY);
glVertex2i(0 + width - m_borderWidth, 0 + m_borderWidth);
// Top right
glTexCoord2f(1.0, maxTexCoordBorderY);
glVertex2i(0 + width, 0 + m_borderWidth);
// Bottom right
glTexCoord2f(1.0, (GLfloat)1.0 - maxTexCoordBorderY);
glVertex2i(0 + width, 0 + height - m_borderWidth);
glEnd();
// Bottom left quad [6]
glBegin(GL_QUADS);
// Bottom left
glTexCoord2f(0.0f, 1.0);
glVertex2i(0, 0 + height);
// Top left
glTexCoord2f(0.0f, (GLfloat)1.0 - maxTexCoordBorderY);
glVertex2i(0, 0 + height - m_borderWidth);
// Top right
glTexCoord2f(maxTexCoordBorderX, (GLfloat)1.0 - maxTexCoordBorderY);
glVertex2i(0 + m_borderWidth, 0 + height - m_borderWidth);
// Bottom right
glTexCoord2f(maxTexCoordBorderX, 1.0);
glVertex2i(0 + m_borderWidth, 0 + height);
glEnd();
// Bottom middle quad [7]
glBegin(GL_QUADS);
// Bottom left
glTexCoord2f(maxTexCoordBorderX + texelCentersOffsetX, 1.0);
glVertex2i(0 + m_borderWidth, 0 + height);
// Top left
glTexCoord2f(maxTexCoordBorderX + texelCentersOffsetX, (GLfloat)1.0 - maxTexCoordBorderY);
glVertex2i(0 + m_borderWidth, 0 + height - m_borderWidth);
// Top right
glTexCoord2f((GLfloat)1.0 - maxTexCoordBorderX - texelCentersOffsetX, (GLfloat)1.0 - maxTexCoordBorderY);
glVertex2i(0 + width - m_borderWidth, 0 + height - m_borderWidth);
// Bottom right
glTexCoord2f((GLfloat)1.0 - maxTexCoordBorderX - texelCentersOffsetX, 1.0);
glVertex2i(0 + width - m_borderWidth, 0 + height);
glEnd();
// Bottom right quad [8]
glBegin(GL_QUADS);
// Bottom left
glTexCoord2f((GLfloat)1.0 - maxTexCoordBorderX, 1.0);
glVertex2i(0 + width - m_borderWidth, 0 + height);
// Top left
glTexCoord2f((GLfloat)1.0 - maxTexCoordBorderX, (GLfloat)1.0 - maxTexCoordBorderY);
glVertex2i(0 + width - m_borderWidth, 0 + height - m_borderWidth);
// Top right
glTexCoord2f(1.0, (GLfloat)1.0 - maxTexCoordBorderY);
glVertex2i(0 + width, 0 + height - m_borderWidth);
// Bottom right
glTexCoord2f(1.0, 1.0);
glVertex2i(0 + width, 0 + height);
glEnd();
GLfloat xTexOffset;
GLfloat yTexOffset;
if(m_borderWidth > 0)
{
glBindTexture(GL_TEXTURE_2D, m_centerTexture); // If there's a border, we have to use
// second texture now for middle quad
xTexOffset = 0.0; // We are using another texture, so middle middle quad
yTexOffset = 0.0; // has to be texture with a whole texture
}
else
{
// Don't bind any texture here - we're still using the same one
xTexOffset = maxTexCoordBorderX; // But it implies using offset which equals
yTexOffset = maxTexCoordBorderY; // maximum texture coordinates
}
// Middle middle quad [9]
glBegin(GL_QUADS);
// Bottom left
glTexCoord2f(xTexOffset, maxTexCoordHeight - yTexOffset);
glVertex2i(0 + m_borderWidth, 0 + height - m_borderWidth);
// Top left
glTexCoord2f(xTexOffset, yTexOffset);
glVertex2i(0 + m_borderWidth, 0 + m_borderWidth);
// Top right
glTexCoord2f(maxTexCoordWidth - xTexOffset, yTexOffset);
glVertex2i(0 + width - m_borderWidth, 0 + m_borderWidth);
// Bottom right
glTexCoord2f(maxTexCoordWidth - xTexOffset, maxTexCoordHeight - yTexOffset);
glVertex2i(0 + width - m_borderWidth, 0 + height - m_borderWidth);
glEnd();
glEndList();
}
glCallList(m_displayListId); // Now we can call earlier or now created display list
glPopMatrix();
return true;
}
There is probably too much of code there, but I wanted to show everything. The main thing about this version is use of display lists and glVertex2i
which are deprecated.
I thought the problem of such slow down was use of this obsolete method which I read is quite slow, so I decided to go for VBO
. I've used this tutorial and according to it I changed my method like this:
bool Background::draw(const TPoint& pos, int width, int height)
{
if(width <= 0 || height <= 0)
return false;
glPushMatrix();
glTranslatef((GLfloat)pos.x, (GLfloat)pos.y, 0.0f); // Move background to right direction
if((width != m_savedWidth) || (height != m_savedHeight)) // If size to draw is different than the one saved in display list,
// then recalculate everything and save in display list
{
// That size will be now saved in display list
m_savedWidth = width;
m_savedHeight = height;
GLfloat texelCentersOffsetX = (GLfloat)1/(2*m_width);
// Instead of coordinates range 0..1 we need to specify new ones
GLfloat maxTexCoordWidth = m_bTiling ? (GLfloat)width/m_width : 1.0;
GLfloat maxTexCoordHeight = m_bTiling ? (GLfloat)height/m_height : 1.0;
GLfloat maxTexCoordBorderX = (GLfloat)m_borderWidth/m_width;
GLfloat maxTexCoordBorderY = (GLfloat)m_borderWidth/m_height;
/* 9-cell-pattern, each number represents one quad
-------------------
| 1 | 2 | 3 |
-------------------
| | | |
| 4 | 9 | 5 |
| | | |
-------------------
| 6 | 7 | 8 |
-------------------
*/
/* How vertices are distributed on one quad made of two triangles
v1 ------ v0
| / |
| / |
| / |
v2 ------ v3
*/
GLfloat vertices[] = {
// Top left quad [1]
m_borderWidth, 0, 0, // v0
0, 0, 0, // v1
0, m_borderWidth, 0, // v2
0, m_borderWidth, 0, // v2
m_borderWidth, m_borderWidth, 0, // v3
m_borderWidth, 0, 0, // v0
// Top middle quad [2]
width-m_borderWidth, 0, 0, // v0
m_borderWidth, 0, 0, // v1
m_borderWidth, m_borderWidth, 0, // v2
m_borderWidth, m_borderWidth, 0, // v2
width-m_borderWidth, m_borderWidth, 0, // v3
width-m_borderWidth, 0, 0, // v0
// Top right quad [3]
width, 0, 0, // v0
width-m_borderWidth, 0, 0, // v1
width-m_borderWidth, m_borderWidth, 0, // v2
width-m_borderWidth, m_borderWidth, 0, // v2
width, m_borderWidth, 0, // v3
width, 0, 0, // v0
// Middle left quad [4]
m_borderWidth, m_borderWidth, 0, // v0
0, m_borderWidth, 0, // v1
0, height-m_borderWidth, 0, // v2
0, height-m_borderWidth, 0, // v2
m_borderWidth, height-m_borderWidth, 0, // v3
m_borderWidth, m_borderWidth, 0, // v0
// Middle right quad [5]
width, m_borderWidth, 0, // v0
width-m_borderWidth, m_borderWidth, 0, // v1
width-m_borderWidth, height-m_borderWidth, 0, // v2
width-m_borderWidth, height-m_borderWidth, 0, // v2
width, height-m_borderWidth, 0, // v3
width, m_borderWidth, 0, // v0
// Bottom left quad [6]
m_borderWidth, height-m_borderWidth, 0, // v0
0, height-m_borderWidth, 0, // v1
0, height, 0, // v2
0, height, 0, // v2
m_borderWidth, height, 0, // v3
m_borderWidth, height-m_borderWidth, 0, // v0
// Bottom middle quad [7]
width-m_borderWidth, height-m_borderWidth, 0, // v0
m_borderWidth, height-m_borderWidth, 0, // v1
m_borderWidth, height, 0, // v2
m_borderWidth, height, 0, // v2
width-m_borderWidth, height, 0, // v3
width-m_borderWidth, height-m_borderWidth, 0, // v0
// Bottom right quad [8]
width, height-m_borderWidth, 0, // v0
width-m_borderWidth, height-m_borderWidth, 0, // v1
width-m_borderWidth, height, 0, // v2
width-m_borderWidth, height, 0, // v2
width, height, 0, // v3
width, height-m_borderWidth, 0, // v0
// Middle middle quad [9]
width-m_borderWidth, m_borderWidth, 0, // v0
m_borderWidth, m_borderWidth, 0, // v1
m_borderWidth, height-m_borderWidth, 0, // v2
m_borderWidth, height-m_borderWidth, 0, // v2
width-m_borderWidth, height-m_borderWidth, 0, // v3
width-m_borderWidth, m_borderWidth, 0 // v0
};
copy(vertices, vertices + 162, m_vCoords); // 162, because we have 162 coordinates
int dataSize = 162 * sizeof(GLfloat);
m_vboId = createVBO(m_vCoords, dataSize);
}
// bind VBOs for vertex array
glBindBufferARB(GL_ARRAY_BUFFER_ARB, m_vboId); // for vertex coordinates
glEnableClientState(GL_VERTEX_ARRAY); // activate vertex coords array
glVertexPointer(3, GL_FLOAT, 0, 0);
glDrawArrays(GL_TRIANGLES, 0, 162);
glDisableClientState(GL_VERTEX_ARRAY); // deactivate vertex array
// bind with 0, so, switch back to normal pointer operation
glBindBufferARB(GL_ARRAY_BUFFER_ARB, NO_VBO_ID);
glPopMatrix();
return true;
}
It is quite similar to previous version, but instead of glDisplayList
and glVertex2i()
I used VBO
which is being created from data stored in an array.
But results disappointed me, because I got performance drop instead of boost, I got barely
~260 fps
and I must note that in this method version I haven't yet implemented use of textures, so there are only quads for now without any texture bound to it.
I've read a few articles to find what could be the reason of such slow down and found out that maybe it is due to big amount of small VBO
s and I should probably have one VBO
containing all backgrounds data instead of separate VBO
for each background. But the problem is that Object
s can move around and they have different textures (and texture atlas is not a good solution for me), so it would be difficult for me to update those changes for those Object
s that changed their state. For now, when Object
s is being changed, I just recreate it's VBO
and the rest VBO
s stay untouched.
So my question is - what am I doing wrong? Does using bigger (~600) number of small VBO
s is really slower than obsolete method of drawing with glVertex2i
? And what could be - maybe not the best, but better - solution in my case?
By the looks of it, you're recreating the VBO with every frame. If you just want to change the data use glBufferSubData
, as glBufferData
goes through the whole, lengthy VBO initialization.
If the data is static, create the VBO only once, then reuse it.
Just because the fixed-function stuff is old, deprecated, and generally not recommended, does not necessarily mean it is always slow.
Nor does the fancy 'new' (it's been around a while) functionality with buffers and shaders and such-like necessarily mean that everything will be lightning fast.
When you wrap your drawing in a display list, you are basically passing off a bunch of operations to the driver. This actually gives a fair bit of scope for the driver to optimise what is happening. It may very well package most of what you're doing up into a pretty efficient pre-packaged lump of GPU operations. That may well be slightly more efficient than what happens when you package up your data into buffers and send them off.
That isn't to say that I would recommend sticking with the old-style interface, but certainly I'm not surprised that there are cases where it does a good job.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With