Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

The Fastest Way to Batch Calls in WebGL

I'm trying to rewrite my canvas-based rendering for my 2d game engine. I've made good progress and can render textures to the webgl context fine, complete with scaling, rotation and blending. But my performance sucks. On my test laptop, I can get 30 fps in vanilla 2d canvas with 1,000 entities on screen at once; in WebGL, I get 30 fps with 500 entities on screen. I'd expect the situation to be reverse!

I have a sneaking suspicion that the culprit is all this Float32Array buffer garbage I'm tossing around. Here's my render code:

// boilerplate code and obj coordinates

// grab gl context
var canvas = sys.canvas;
var gl = sys.webgl;
var program = sys.glProgram;

// width and height
var scale = sys.scale;
var tileWidthScaled = Math.floor(tileWidth * scale);
var tileHeightScaled = Math.floor(tileHeight * scale);
var normalizedWidth = tileWidthScaled / this.width;
var normalizedHeight = tileHeightScaled / this.height;

var worldX = targetX * scale;
var worldY = targetY * scale;

this.bindGLBuffer(gl, this.vertexBuffer, sys.glWorldLocation);
this.bufferGLRectangle(gl, worldX, worldY, tileWidthScaled, tileHeightScaled);

gl.activeTexture(gl.TEXTURE0);
gl.bindTexture(gl.TEXTURE_2D, this.texture);

var frameX = (Math.floor(tile * tileWidth) % this.width) * scale;
var frameY = (Math.floor(tile * tileWidth / this.width) * tileHeight) * scale;

// fragment (texture) shader
this.bindGLBuffer(gl, this.textureBuffer, sys.glTextureLocation);
this.bufferGLRectangle(gl, frameX, frameY, normalizedWidth, normalizedHeight);

gl.drawArrays(gl.TRIANGLES, 0, 6);

bufferGLRectangle: function (gl, x, y, width, height) {
    var left = x;
    var right = left + width;
    var top = y;
    var bottom = top + height;
    gl.bufferData(gl.ARRAY_BUFFER, new Float32Array([
        left, top,
        right, top,
        left, bottom,
        left, bottom,
        right, top,
        right, bottom
    ]), gl.STATIC_DRAW);
},

bindGLBuffer: function (gl, buffer, location) {
    gl.bindBuffer(gl.ARRAY_BUFFER, buffer);
    gl.vertexAttribPointer(location, 2, gl.FLOAT, false, 0, 0);
},

And here's my simple test shaders (these are missing blending, scaling & rotation):

// fragment (texture) shader
precision mediump float;
uniform sampler2D image;
varying vec2 texturePosition;

void main() {
    gl_FragColor = texture2D(image, texturePosition);
}

// vertex shader
attribute vec2 worldPosition;
attribute vec2 vertexPosition;

uniform vec2 canvasResolution;
varying vec2 texturePosition;

void main() {
    vec2 zeroToOne = worldPosition / canvasResolution;
    vec2 zeroToTwo = zeroToOne * 2.0;
    vec2 clipSpace = zeroToTwo - 1.0;

    gl_Position = vec4(clipSpace * vec2(1, -1), 0, 1);
    texturePosition = vertexPosition;
}

Any ideas on how to get better performance? Is there a way to batch my drawArrays? Is there a way to cut down on the buffer garbage?

Thanks!

like image 405
Abraham Walters Avatar asked Mar 22 '13 02:03

Abraham Walters


2 Answers

There's two big issues I can see here that will adversely affect your performance.

You're creating a lot of temporary Float32Arrays, which are currently expensive to construct (That should get better in the future). It would be far better in this case to create a single array and set the vertices each time like so:

verts[0] = left; verts[1] = top;
verts[2] = right; verts[3] = top;
// etc... 
gl.bufferData(gl.ARRAY_BUFFER, verts, gl.STATIC_DRAW);

The bigger issue by far, however, is that you're only drawing a single quad at a time. 3D APIs simply aren't designed to do this efficiently. What you want to do is try and squeeze as many triangles as possible into each drawArrays/drawElements call you make.

There's several ways to do that, the most straightforward being to fill up a buffer with as many quads as you can that share the same texture, then draw them all in one go. In psuedocode:

var MAX_QUADS_PER_BATCH = 100;
var VERTS_PER_QUAD = 6;
var FLOATS_PER_VERT = 2;
var verts = new Float32Array(MAX_QUADS_PER_BATCH * VERTS_PER_QUAD * FLOATS_PER_VERT);

var quadCount = 0;
function addQuad(left, top, bottom, right) {
    var offset = quadCount * VERTS_PER_QUAD * FLOATS_PER_VERT;

    verts[offset] = left; verts[offset+1] = top;
    verts[offset+2] = right; verts[offset+3] = top;
    // etc...

    quadCount++;

    if(quadCount == MAX_QUADS_PER_BATCH) {
        flushQuads();
    }
}

function flushQuads() {
    gl.bindBuffer(gl.ARRAY_BUFFER, vertsBuffer);
    gl.bufferData(gl.ARRAY_BUFFER, verts, gl.STATIC_DRAW); // Copy the buffer we've been building to the GPU.

    // Make sure vertexAttribPointers are set, etc...

    gl.drawArrays(gl.TRIANGLES, 0, quadCount + VERTS_PER_QUAD);
}

// In your render loop

for(sprite in spriteTypes) {
    gl.bindTexture(gl.TEXTURE_2D, sprite.texture);

    for(instance in sprite.instances) {
        addQuad(instance.left, instance.top, instance.right, instance.bottom);  
    }

    flushQuads();
}

That's an oversimplification, and there's ways to batch even more, but hopefully that gives you an idea of how to start batching your calls for better performance.

like image 91
Toji Avatar answered Nov 04 '22 08:11

Toji


If you use WebGL Inspector you'll see in the trace if you do any unnecessary GL instructions (they're marked with bright yellow background). This might give you an idea on how to optimize your rendering.

Generally speaking, sort your draw calls so all using the same program, then attributes, then textures and then uniforms are done in order. This way you'll have as few GL instructions (and JS instructions) as possible.

like image 2
MikaelEmtinger Avatar answered Nov 04 '22 09:11

MikaelEmtinger