So I'm working with a 2D skeletal animation system.
There are X number of bones, each bone has at least 1 part (a quad, two triangles). On average, I have maybe 20 bones, and 30 parts. Most bones depend on a parent, the bones will move every frame. There are up to 1000 frames in total per animation, and I'm using about 50 animations. A total of around 50,000 frames loaded in memory at any one time. The parts differ between instances of the skeleton.
The first approach I took was to calculate the position/rotation of each bone, and build up a vertex array, which consisted of this, for each part:
[x1,y1,u1,v1],[x2,y2,u2,v2],[x3,y3,u3,v3],[x4,y4,u4,v4]
And pass this through to glDrawElements each frame.
Which looks fine, covers all scenarios that I need, doesn't use much memory, but performs like a dog. On an iPod 4, could get maybe 15fps with 10 of these skeletons being rendered.
I worked out that most of the performance was being eaten up by copying so much vertex data each frame. I decided to go to another extreme, and "pre-calculated" the animations, building up a vertex buffer at the start for each character, that contained the xyuv coordinates for every frame, for every part, in a single character. Then, I calculate the index of the frame that should be used for a particular time, and calculate a delta value, which is passed through to the shader used to interpolate between the current and the next frames XY positions.
The vertices looked like this, per frame
[--------------------- Frame 1 ---------------------],[------- Frame 2 ------]
[x1,y1,u1,v1,boneIndex],[x2, ...],[x3, ...],[x4, ...],[x1, ...][x2, ...][....]
The vertex shader looks like this:
attribute vec4 a_position;
attribute vec4 a_nextPosition;
attribute vec2 a_texCoords;
attribute float a_boneIndex;
uniform mat4 u_projectionViewMatrix;
uniform float u_boneAlpha[255];
varying vec2 v_texCoords;
void main() {
float alpha = u_boneAlpha[int(a_boneIndex)];
vec4 position = mix(a_position, a_nextPosition, alpha);
gl_Position = u_projectionViewMatrix * position;
v_texCoords = a_texCoords;
}
Now, performance is great, with 10 of these on screen, it sits comfortably at 50fps. But now, it uses a metric ton of memory. I've optimized that by losing some precision on xyuv, which are now ushorts.
There's also the problem that the bone-dependencies are lost. If there are two bones, a parent and child, and the child has a keyframe at 0s and 2s, the parent has a keyframe at 0s, 0.5s, 1.5s, 2s, then the child won't be changed between 0.5s and 1.5s as it should.
I came up with a solution to fix this bone problem -- by forcing the child to have keyframes at the same points as the parents. But this uses even more memory, and basically kills the point of the bone hierarchy.
This is where I'm at now. I'm trying to find a balance between performance and memory usage. I know there is a lot of redundant information here (UV coordinates are identical for all the frames of a particular part, so repeated ~30 times). And a new buffer has to be created for every set of parts (which have unique XYUV coordinates -- positions change because different parts are different sizes)
Right now I'm going to try setting up one vertex array per character, which has the xyuv for all parts, and calculating the matrices for each parts, and repositioning them in the shader. I know this will work, but I'm worried that the performance won't be any better than just uploading the XYUV's for each frame that I was doing at the start.
Is there a better way to do this without losing the performance I've gained?
Are there any wild ideas I could try?
The better way to do this is to transform your 30 parts on the fly, not make thousands of copies of your parts in different positions. Your vertex buffer will contain one copy of your vertex data, saving tons of memory. Then each frame can be represented by a set of transformations passed as a uniform to your vertex shader for each bone you draw with a call to glDrawElements()
. Each dependent bone's transformation is built relative to the parent bone. Then, depending on where on the continuum between hand crafted and procedurally generated you want your animations, your sets of transforms can take more or less space and CPU computing time.
Jason L. McKesson's free book, Learning Modern 3D Graphics Programming, gives a good explanation on how to accomplish this in chapter 6. The example program at the end of this chapter shows how to use a matrix stack to implement a hierarchical model. I have an OpenGL ES 2.0 on iOS port of this program available.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With