When are VBOs faster than "simple" OpenGL primitives (glBegin())?

Tags:

After many years of hearing about Vertex Buffer Objects (VBOs), I finally decided to experiment with them (my stuff isn't normally performance critical, obviously...)

I'll describe my experiment below, but to make a long story short, I'm seeing indistinguishable performance between "simple" direct mode (glBegin()/glEnd()), vertex array (CPU side) and VBO (GPU side) rendering modes. I'm trying to understand why this is, and under what conditions I can expect to see the VBOs significantly outshine their primitive (pun intended) ancestors.

Experiment Details

For the experiment, I generated a (static) 3D Gaussian cloud of a large number of points. Each point has vertex & color information associated with it. Then I rotated the camera around the cloud in successive frames in sort of an "orbiting" behavior. Again, the points are static, only the eye moves (via gluLookAt()). The data are generated once prior to any rendering & stored in two arrays for use in the rendering loop.

For direct rendering, the entire data set is rendered in a single glBegin()/glEnd() block with a loop containing a single call each to glColor3fv() and glVertex3fv().

For vertex array and VBO rendering, the entire data set is rendered with a single glDrawArrays() call.

Then, I simply run it for a minute or so in a tight loop and measure average FPS with the high performance timer.

Performance Results ##

As mentioned above, performance was indistinguishable on both my desktop machine (XP x64, 8GB RAM, 512 MB Quadro 1700), and my laptop (XP32, 4GB ram, 256 MB Quadro NVS 110). It did scale as expected with the number of points, however. Obviously, I also disabled vsync.

Specific results from laptop runs (rendering w/GL_POINTS):

glBegin()/glEnd():

1K pts --> 603 FPS
10K pts --> 401 FPS
100K pts --> 97 FPS
1M pts --> 14 FPS

Vertex Arrays (CPU side):

1K pts --> 603 FPS
10K pts --> 402 FPS
100K pts --> 97 FPS
1M pts --> 14 FPS

Vertex Buffer Objects (GPU side):

1K pts --> 604 FPS
10K pts --> 399 FPS
100K pts --> 95 FPS
1M pts --> 14 FPS

I rendered the same data with GL_TRIANGLE_STRIP and got similarly indistinguishable (though slower as expected due to extra rasterization). I can post those numbers too if anybody wants them. .

Question(s)

What gives?
What do I have to do to realize the promised performance gain of VBOs?
What am I missing?

219

asked Jan 10 '09 04:01

Drew Hall

1 Answers

There are a lot of factors to optimizing 3D rendering. usually there are 4 bottlenecks:

CPU (creating vertices, APU calls, everything else)
Bus (CPU<->GPU transfer)
Vertex (vertex shader over fixed function pipeline execution)
Pixel (fill, fragment shader execution and rops)

Your test is giving skewed results because you have a lot of CPU (and bus) while maxing out vertex or pixel throughput. VBOs are used to lower CPU (fewer api calls, parallel to CPU DMA transfers). Since you are not CPU bound, they don't give you any gain. This is optimization 101. In a game for example CPU becomes precious as it is needed for other things like AI and physics, not just for issuing tons of api calls. It is easy to see that writing vertex data (3 floats for example) directly to a memory pointer is much faster than calling a function that writes 3 floats to memory - at the very least you save the cycles for the call.

answered Sep 20 '22 01:09

starmole

Related questions
                            
                                String concatenation vs String Builder. Performance
                            
                                Calculating the speed of routines?
                            
                                Does System.Activator.CreateInstance(T) have performance issues big enough to discourage us from using it casually?
                            
                                Why is floor() so slow?
                            
                                calendar.getInstance() or calendar.clone()
                            
                                CSS transform vs position
                            
                                Is there a faster alternative to Google Analytics? [closed]
                            
                                Why null-terminated strings? Or: null-terminated vs. characters + length storage
                            
                                Read speed of SharedPreferences
                            
                                Optimizing Lookups: Dictionary key lookups vs. Array index lookups
                            
                                Numpy and line intersections
                            
                                jQuery animate() and browser performance
                            
                                PHP landmines in general [closed]
                            
                                Most appropriate way to get this: $($(".answer")[0])
                            
                                Why is hashCode slower than a similar method?
                            
                                Optimizing numerical array performance in Haskell
                            
                                Significant FMA performance anomaly experienced in the Intel Broadwell processor
                            
                                Javascript: What's the algorithmic performance of 'splice'?
                            
                                Excel VBA Performance - 1 million rows - Delete rows containing a value, in less than 1 min
                            
                                Why is list(x for x in a) faster for a=[0] than for a=[]?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

When are VBOs faster than "simple" OpenGL primitives (glBegin())?

Tags:

performance

graphics

opengl

vbo

vertex-buffer