Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

VBOs slower than obsolete method of drawing primitives - why?

I'm working on a Tile-based OpenGL, C++ application. I'm adding sample screen from application, so that it will be more clear:

I have Tile class which contains an array of Objects. Each tile can store up to 15 objects - the example of that is Tile with green and yellow square on it (two objects), so it's 10x10x15 = 1500 Objects to draw (in the worst case, because I'm not handling 'empty' ones). Usually it's less, in my testings I use around 600 of them. Object has it's own graphic, that can be drawn. Each Object belongs to one Tile at a time, but it can be moved (as for example red squares in the picture).

Objects backgrounds are going to have a border and they need to be nicely scalable, so I'm using 9-patch pattern to draw them (they are made of 9 quads).

Without drawing Tiles (their Objects to be precise), my application has around 600 fps.

At first, I've been using obsolete method to draw those Tiles - using glBegin(GL_QUADS)/glEnd() and glDisplayLists. I had a big drop of performance due to that drawing - from 600 to 320 fps. This is how I've been drawing them:

bool Background::draw(const TPoint& pos, int width, int height)
{
    if(width <= 0 || height <= 0)
        return false;
    //glFrontFace(GL_CW);
    glPushMatrix();
    glTranslatef((GLfloat)pos.x, (GLfloat)pos.y, 0.0f);     // Move background to right direction
    if((width != m_savedWidth) || (height != m_savedHeight))    // If size to draw is different than the one saved in display list,
        // then recalculate everything and save in display list
    {
        // That size will be now saved in display list
        m_savedWidth = width;
        m_savedHeight = height;

        // If this background doesn't have unique display list id specified yet,
        // then let OpenGL generate one
        if(m_displayListId == NO_DISPLAY_LIST_ID)
        {
            GLuint displayList;
            displayList = glGenLists(1);
            m_displayListId = displayList;
        }

        glNewList(m_displayListId, GL_COMPILE);

        GLfloat texelCentersOffsetX = (GLfloat)1/(2*m_width);

        // Instead of coordinates range 0..1 we need to specify new ones
        GLfloat maxTexCoordWidth = m_bTiling    ? (GLfloat)width/m_width    :   1.0;
        GLfloat maxTexCoordHeight = m_bTiling   ? (GLfloat)height/m_height  :   1.0;

        GLfloat maxTexCoordBorderX = (GLfloat)m_borderWidth/m_width;
        GLfloat maxTexCoordBorderY = (GLfloat)m_borderWidth/m_height;

        /* 9-cell-pattern

        -------------------
        | 1 |    2    | 3 |
        -------------------
        |   |         |   |
        | 4 |    9    | 5 |
        |   |         |   |
        -------------------
        | 6 |    7    | 8 |
        -------------------

        */

        glBindTexture(GL_TEXTURE_2D, m_texture);               // Select Our Texture

        // Top left quad [1]
        glBegin(GL_QUADS);
            // Bottom left
            glTexCoord2f(0.0, maxTexCoordBorderY);
            glVertex2i(0, 0 + m_borderWidth);

            // Top left
            glTexCoord2f(0.0, 0.0);
            glVertex2i(0, 0);

            // Top right
            glTexCoord2f(maxTexCoordBorderX, 0.0);
            glVertex2i(0 + m_borderWidth, 0);

            // Bottom right
            glTexCoord2f(maxTexCoordBorderX, maxTexCoordBorderY);
            glVertex2i(0 + m_borderWidth, 0 + m_borderWidth);
        glEnd();

        // Top middle quad [2]
        glBegin(GL_QUADS);
            // Bottom left
            glTexCoord2f(maxTexCoordBorderX + texelCentersOffsetX, maxTexCoordBorderY);
            glVertex2i(0 + m_borderWidth, 0 + m_borderWidth);

            // Top left
            glTexCoord2f(maxTexCoordBorderX + texelCentersOffsetX, 0.0);
            glVertex2i(0 + m_borderWidth, 0);

            // Top right
            glTexCoord2f((GLfloat)1.0 - maxTexCoordBorderX - texelCentersOffsetX, 0.0);
            glVertex2i(0 + width - m_borderWidth, 0);

            // Bottom right
            glTexCoord2f((GLfloat)1.0 - maxTexCoordBorderX - texelCentersOffsetX, maxTexCoordBorderY);
            glVertex2i(0 + width - m_borderWidth, 0 + m_borderWidth);
        glEnd();

        // Top right quad [3]
        glBegin(GL_QUADS);
            // Bottom left
            glTexCoord2f((GLfloat)1.0 - maxTexCoordBorderX, maxTexCoordBorderY);
            glVertex2i(0 + width - m_borderWidth, 0 + m_borderWidth);

            // Top left
            glTexCoord2f((GLfloat)1.0 - maxTexCoordBorderX, 0.0);
            glVertex2i(0 + width - m_borderWidth, 0);

            // Top right
            glTexCoord2f(1.0, 0.0);
            glVertex2i(0 + width, 0);

            // Bottom right
            glTexCoord2f(1.0, maxTexCoordBorderY);
            glVertex2i(0 + width, 0 + m_borderWidth);
        glEnd();

        // Middle left quad [4]
        glBegin(GL_QUADS);
            // Bottom left
            glTexCoord2f(0.0, (GLfloat)1.0 - maxTexCoordBorderY );
            glVertex2i(0, 0 + height - m_borderWidth);

            // Top left
            glTexCoord2f(0.0, maxTexCoordBorderY );
            glVertex2i(0, 0 + m_borderWidth);

            // Top right
            glTexCoord2f(maxTexCoordBorderX, maxTexCoordBorderY );
            glVertex2i(0 + m_borderWidth, 0 + m_borderWidth);

            // Bottom right
            glTexCoord2f(maxTexCoordBorderX, (GLfloat)1.0 - maxTexCoordBorderY );
            glVertex2i(0 + m_borderWidth, 0 + height - m_borderWidth);
        glEnd();

        // Middle right quad [5]
        glBegin(GL_QUADS);
            // Bottom left
            glTexCoord2f((GLfloat)1.0 - maxTexCoordBorderX, (GLfloat)1.0 - maxTexCoordBorderY);
            glVertex2i(0 + width - m_borderWidth, 0 + height - m_borderWidth);

            // Top left
            glTexCoord2f((GLfloat)1.0 - maxTexCoordBorderX, maxTexCoordBorderY);
            glVertex2i(0 + width - m_borderWidth, 0 + m_borderWidth);

            // Top right
            glTexCoord2f(1.0, maxTexCoordBorderY);
            glVertex2i(0 + width, 0 + m_borderWidth);

            // Bottom right
            glTexCoord2f(1.0, (GLfloat)1.0 - maxTexCoordBorderY);
            glVertex2i(0 + width, 0 + height - m_borderWidth);
        glEnd();

        // Bottom left quad [6]
        glBegin(GL_QUADS);
            // Bottom left
            glTexCoord2f(0.0f, 1.0);
            glVertex2i(0, 0 + height);

            // Top left
            glTexCoord2f(0.0f, (GLfloat)1.0 - maxTexCoordBorderY);
            glVertex2i(0, 0 + height - m_borderWidth);

            // Top right
            glTexCoord2f(maxTexCoordBorderX, (GLfloat)1.0 - maxTexCoordBorderY);
            glVertex2i(0 + m_borderWidth, 0 + height - m_borderWidth);

            // Bottom right
            glTexCoord2f(maxTexCoordBorderX, 1.0);
            glVertex2i(0 + m_borderWidth, 0 + height);
        glEnd();

        // Bottom middle quad [7]
        glBegin(GL_QUADS);
            // Bottom left
            glTexCoord2f(maxTexCoordBorderX + texelCentersOffsetX, 1.0);
            glVertex2i(0 + m_borderWidth, 0 + height);

            // Top left
            glTexCoord2f(maxTexCoordBorderX + texelCentersOffsetX, (GLfloat)1.0 - maxTexCoordBorderY);
            glVertex2i(0 + m_borderWidth, 0 + height - m_borderWidth);

            // Top right
            glTexCoord2f((GLfloat)1.0 - maxTexCoordBorderX - texelCentersOffsetX, (GLfloat)1.0 - maxTexCoordBorderY);
            glVertex2i(0 + width - m_borderWidth, 0 + height - m_borderWidth);

            // Bottom right
            glTexCoord2f((GLfloat)1.0 - maxTexCoordBorderX - texelCentersOffsetX, 1.0);
            glVertex2i(0 + width - m_borderWidth, 0 + height);
        glEnd();

        // Bottom right quad [8]
        glBegin(GL_QUADS);
            // Bottom left
            glTexCoord2f((GLfloat)1.0 - maxTexCoordBorderX, 1.0);
            glVertex2i(0 + width - m_borderWidth, 0 + height);

            // Top left
            glTexCoord2f((GLfloat)1.0 - maxTexCoordBorderX, (GLfloat)1.0 - maxTexCoordBorderY);
            glVertex2i(0 + width - m_borderWidth, 0 + height - m_borderWidth);

            // Top right
            glTexCoord2f(1.0, (GLfloat)1.0 - maxTexCoordBorderY);
            glVertex2i(0 + width, 0 + height - m_borderWidth);

            // Bottom right
            glTexCoord2f(1.0, 1.0);
            glVertex2i(0 + width, 0 + height);
        glEnd();

        GLfloat xTexOffset;
        GLfloat yTexOffset;

        if(m_borderWidth > 0)
        {
            glBindTexture(GL_TEXTURE_2D, m_centerTexture);     // If there's a border, we have to use
            // second texture now for middle quad
            xTexOffset = 0.0;                                  // We are using another texture, so middle middle quad
            yTexOffset = 0.0;                                  // has to be texture with a whole texture
        }
        else
        {
            // Don't bind any texture here - we're still using the same one

            xTexOffset = maxTexCoordBorderX;                   // But it implies using offset which equals
            yTexOffset = maxTexCoordBorderY;                   // maximum texture coordinates
        }

        // Middle middle quad [9]
        glBegin(GL_QUADS);
            // Bottom left
            glTexCoord2f(xTexOffset, maxTexCoordHeight - yTexOffset);
            glVertex2i(0 + m_borderWidth, 0 + height - m_borderWidth);

            // Top left
            glTexCoord2f(xTexOffset, yTexOffset);
            glVertex2i(0 + m_borderWidth, 0 + m_borderWidth);

            // Top right
            glTexCoord2f(maxTexCoordWidth - xTexOffset, yTexOffset);
            glVertex2i(0 + width - m_borderWidth, 0 + m_borderWidth);

            // Bottom right
            glTexCoord2f(maxTexCoordWidth - xTexOffset, maxTexCoordHeight - yTexOffset);
            glVertex2i(0 + width - m_borderWidth, 0 + height - m_borderWidth);
        glEnd();

        glEndList();
    }

    glCallList(m_displayListId); // Now we can call earlier or now created display list

    glPopMatrix();

    return true;
}

There is probably too much of code there, but I wanted to show everything. The main thing about this version is use of display lists and glVertex2i which are deprecated.

I thought the problem of such slow down was use of this obsolete method which I read is quite slow, so I decided to go for VBO. I've used this tutorial and according to it I changed my method like this:

bool Background::draw(const TPoint& pos, int width, int height)
{
    if(width <= 0 || height <= 0)
        return false;

    glPushMatrix();
    glTranslatef((GLfloat)pos.x, (GLfloat)pos.y, 0.0f);             // Move background to right direction
    if((width != m_savedWidth) || (height != m_savedHeight))        // If size to draw is different than the one saved in display list,
                                                                    // then recalculate everything and save in display list
    {
        // That size will be now saved in display list
        m_savedWidth = width;
        m_savedHeight = height;

        GLfloat texelCentersOffsetX = (GLfloat)1/(2*m_width);

        // Instead of coordinates range 0..1 we need to specify new ones
        GLfloat maxTexCoordWidth = m_bTiling    ? (GLfloat)width/m_width    :   1.0;
        GLfloat maxTexCoordHeight = m_bTiling   ? (GLfloat)height/m_height  :   1.0;

        GLfloat maxTexCoordBorderX = (GLfloat)m_borderWidth/m_width;
        GLfloat maxTexCoordBorderY = (GLfloat)m_borderWidth/m_height;

        /* 9-cell-pattern, each number represents one quad

        -------------------
        | 1 |    2    | 3 |
        -------------------
        |   |         |   |
        | 4 |    9    | 5 |
        |   |         |   |
        -------------------
        | 6 |    7    | 8 |
        -------------------

        */

        /* How vertices are distributed on one quad made of two triangles

        v1 ------ v0
        |       /  |
        |     /    |
        |  /       |
        v2 ------ v3

        */

        GLfloat vertices[] = { 
                                // Top left quad [1]
                                m_borderWidth, 0, 0,                            // v0
                                0, 0, 0,                                        // v1       
                                0, m_borderWidth, 0,                            // v2               

                                0, m_borderWidth, 0,                            // v2
                                m_borderWidth, m_borderWidth, 0,                // v3
                                m_borderWidth, 0, 0,                            // v0

                                // Top middle quad [2]
                                width-m_borderWidth, 0, 0,                      // v0
                                m_borderWidth, 0, 0,                            // v1
                                m_borderWidth, m_borderWidth, 0,                // v2

                                m_borderWidth, m_borderWidth, 0,                // v2
                                width-m_borderWidth, m_borderWidth, 0,          // v3
                                width-m_borderWidth, 0, 0,                      // v0

                                // Top right quad [3]
                                width, 0, 0,                                    // v0  
                                width-m_borderWidth, 0, 0,                      // v1
                                width-m_borderWidth, m_borderWidth, 0,          // v2

                                width-m_borderWidth, m_borderWidth, 0,          // v2
                                width, m_borderWidth, 0,                        // v3
                                width, 0, 0,                                    // v0

                                // Middle left quad [4]
                                m_borderWidth, m_borderWidth, 0,                // v0
                                0, m_borderWidth, 0,                            // v1
                                0, height-m_borderWidth, 0,                     // v2

                                0, height-m_borderWidth, 0,                     // v2
                                m_borderWidth, height-m_borderWidth, 0,         // v3
                                m_borderWidth, m_borderWidth, 0,                // v0

                                // Middle right quad [5]
                                width, m_borderWidth, 0,                        // v0
                                width-m_borderWidth, m_borderWidth, 0,          // v1
                                width-m_borderWidth, height-m_borderWidth, 0,   // v2

                                width-m_borderWidth, height-m_borderWidth, 0,   // v2
                                width, height-m_borderWidth, 0,                 // v3
                                width, m_borderWidth, 0,                        // v0

                                // Bottom left quad [6]
                                m_borderWidth, height-m_borderWidth, 0,         // v0
                                0, height-m_borderWidth, 0,                     // v1
                                0, height, 0,                                   // v2

                                0, height, 0,                                   // v2
                                m_borderWidth, height, 0,                       // v3
                                m_borderWidth, height-m_borderWidth, 0,         // v0

                                // Bottom middle quad [7]
                                width-m_borderWidth, height-m_borderWidth, 0,   // v0
                                m_borderWidth, height-m_borderWidth, 0,         // v1
                                m_borderWidth, height, 0,                       // v2

                                m_borderWidth, height, 0,                       // v2
                                width-m_borderWidth, height, 0,                 // v3
                                width-m_borderWidth, height-m_borderWidth, 0,   // v0

                                // Bottom right quad [8]
                                width, height-m_borderWidth, 0,                 // v0
                                width-m_borderWidth, height-m_borderWidth, 0,   // v1
                                width-m_borderWidth, height, 0,                 // v2

                                width-m_borderWidth, height, 0,                 // v2
                                width, height, 0,                               // v3
                                width, height-m_borderWidth, 0,                 // v0

                                // Middle middle quad [9]
                                width-m_borderWidth, m_borderWidth, 0,          // v0
                                m_borderWidth, m_borderWidth, 0,                // v1
                                m_borderWidth, height-m_borderWidth, 0,         // v2

                                m_borderWidth, height-m_borderWidth, 0,         // v2
                                width-m_borderWidth, height-m_borderWidth, 0,   // v3
                                width-m_borderWidth, m_borderWidth, 0           // v0
                            };

        copy(vertices, vertices + 162, m_vCoords);              // 162, because we have 162 coordinates 


        int dataSize = 162 * sizeof(GLfloat);
        m_vboId = createVBO(m_vCoords, dataSize);

    }

    // bind VBOs for vertex array
    glBindBufferARB(GL_ARRAY_BUFFER_ARB, m_vboId);          // for vertex coordinates

    glEnableClientState(GL_VERTEX_ARRAY);                   // activate vertex coords array
        glVertexPointer(3, GL_FLOAT, 0, 0);                     
        glDrawArrays(GL_TRIANGLES, 0, 162);
    glDisableClientState(GL_VERTEX_ARRAY);                  // deactivate vertex array

    // bind with 0, so, switch back to normal pointer operation
    glBindBufferARB(GL_ARRAY_BUFFER_ARB, NO_VBO_ID);

    glPopMatrix();

    return true;
}

It is quite similar to previous version, but instead of glDisplayList and glVertex2i() I used VBO which is being created from data stored in an array.

But results disappointed me, because I got performance drop instead of boost, I got barely ~260 fps and I must note that in this method version I haven't yet implemented use of textures, so there are only quads for now without any texture bound to it.

I've read a few articles to find what could be the reason of such slow down and found out that maybe it is due to big amount of small VBOs and I should probably have one VBO containing all backgrounds data instead of separate VBO for each background. But the problem is that Objects can move around and they have different textures (and texture atlas is not a good solution for me), so it would be difficult for me to update those changes for those Objects that changed their state. For now, when Objects is being changed, I just recreate it's VBO and the rest VBOs stay untouched.

So my question is - what am I doing wrong? Does using bigger (~600) number of small VBOs is really slower than obsolete method of drawing with glVertex2i? And what could be - maybe not the best, but better - solution in my case?

like image 637
Piotr Chojnacki Avatar asked Feb 27 '13 15:02

Piotr Chojnacki


2 Answers

By the looks of it, you're recreating the VBO with every frame. If you just want to change the data use glBufferSubData, as glBufferData goes through the whole, lengthy VBO initialization.

If the data is static, create the VBO only once, then reuse it.

like image 190
datenwolf Avatar answered Oct 18 '22 22:10

datenwolf


Just because the fixed-function stuff is old, deprecated, and generally not recommended, does not necessarily mean it is always slow.

Nor does the fancy 'new' (it's been around a while) functionality with buffers and shaders and such-like necessarily mean that everything will be lightning fast.

When you wrap your drawing in a display list, you are basically passing off a bunch of operations to the driver. This actually gives a fair bit of scope for the driver to optimise what is happening. It may very well package most of what you're doing up into a pretty efficient pre-packaged lump of GPU operations. That may well be slightly more efficient than what happens when you package up your data into buffers and send them off.

That isn't to say that I would recommend sticking with the old-style interface, but certainly I'm not surprised that there are cases where it does a good job.

like image 37
JasonD Avatar answered Oct 18 '22 23:10

JasonD