How to safely decouple rendering from updating the model?

Question

Talking with some game developers, they suggested that a performant OpenGL ES based game engine does not handle everything on the main thread. This allows the game engine to perform better on devices with multiple CPU cores.

They said that I could decouple updates from rendering. So if I understood this correct, a game engine run loop can work like this:

Setup a CADisplayLink which calls a render method.
render method renders current world model in background.
render method then calls update method on main thread.

So while it renders in background, it can concurrently already update world model for next iteration.

To me this all feels a lot wonky. Can someone explain or link to how this concurrent rendering + updating of model is done in reality? It boggles my mind how this would not lead to problems because what if model update takes longer than rendering or other way around. Who waits for what and when.

What I try to understand is how this is implemented both theoretically from a high level viewpoint but also in detail.

ipmcc · Accepted Answer

In "reality" there are lots of different approaches. There's not "one true way." What's right for you really depends a lot on factors you've not discussed in your question, but I'll take a shot anyway. I'm also not sure how CADisplayLink is what you want here. I would typically think of that being useful for things that require frame synchronization (i.e. lip-syncing audio and video), which it doesn't sound like you need, but let's look at a couple different ways you might do this. I think the crux of your question is whether or not there's a need for a second "layer" between the model and the view.

Background: Single-Threaded (i.e. Main thread only) Example

Let's first consider how a normal, single-threaded app might work:

User events come in on the main thread
Event handlers trigger calls to controller methods.
Controller methods update model state.
Changes to model state invalidate view state. (i.e. -setNeedsDisplay)
When the next frame comes around, the window server will trigger a re-rendering of the view state from the current model state and display the results

Note that steps 1-4 can happen many times between occurrences of step 5, however, since this is a single-threaded app, while step 5 is happening, steps 1-4 are not happening, and user events are getting queued up waiting for step 5 to complete. This will typically drop frames in an expected way, assuming steps 1-4 are "very fast".

Decoupling Rendering From the Main Thread

Now, let's consider the case where you want to offload the rendering to a background thread. In that case, the sequence should look something like this:

User events come in on the main thread
Event handlers trigger calls to controller methods.
Controller methods update model state.
Changes to model state enqueues an asynchronous rendering task for background execution.
If the asynchronous rendering task completes, it puts the resulting bitmap somewhere known to the view, and calls -setNeedsDisplay on the view.
When the next frame comes around, the window server will trigger a call to -drawRect on the view, which is now implemented as taking the most recently completed bitmap from the "known shared place" and copying it into the view.

There are a few nuances here. Let's first consider the case where you're merely trying to decouple rendering from the main thread (and ignore, for the moment, utilization of multiple cores -- more later):

You almost certainly never want more than one rendering task running at once. Once you start rendering a frame, you probably don't want to cancel/stop rendering it. You probably want to queue up future, un-started rendering operations into a single slot queue which always contains the last enqueued un-started render operation. This should give you reasonable frame dropping behavior so you don't get "behind" rendering frames that you should just drop instead.

If there exists a fully rendered, but not yet displayed, frame, I think you always want to display that frame. With that in mind, you don't want to call -setNeedsDisplay on the view until the bitmap is complete and in the known place.

You will need to synchronize your access across the threads. For instance, when you enqueue the rendering operation, the simplest approach would be to take a read-only snapshot of the model state, and pass that to the render operation, which will only read from the snapshot. This frees you from having to synchronize with the "live" game model (which might be being mutated on the main thread by your controller methods in response to future user events.) The other synchronization challenge is the passing of the completed bitmaps to the view and the calling of -setNeedsDisplay. The easiest approach will likely be to have the image be a property on the view, and to dispatch the setting of that property (with the completed image) and the calling of -setNeedsDisplay over to the main thread.

There is a little hitch here: if user events are coming in at a high rate, and you're capable of rendering multiple frames in the duration of a single display frame (1/60s), you could end up rendering bitmaps that get dropped on the floor. This approach has the advantage of always providing the most up-to-date frame to the view at display time (reduced perceived latency), but it has the *dis*advantage that it incurs all the computational costs of rendering the frames that never get shown (i.e. power). The right trade off here will be different for every situation, and may include more fine-grained adjustments.

Utilizing Multiple Cores -- Inherently Parallel Rendering

Assuming that you've decoupled rendering form the main thread as discussed above, and your rendering operation itself is inherently parallelizable, then just parallelize your one rendering operation while continuing to interact with the view the same way, and you should get multi-core parallelism for free. Perhaps you could divide each frame into N tiles where N is the number of cores, and then once all N tiles finish rendering, you can cobble them together and deliver them to the view as if the rendering operation had been monolithic. If you're working with a read-only snapshot of the model, the setup costs of the N tile tasks should be minimal (since they can all use the same source model.)

Utilizing Multiple Cores -- Inherently Serial Rendering

In the case where your rendering operation is inherently serial (most cases in my experiences) your option to utilize multiple cores is to have as many rendering operations in flight as cores. When one frame completes, it would signal any enqueued or still in flight, but prior, render operations that they may give up and cancel, and then it would set itself to be displayed by the view just as in the decoupling-only example.

As mentioned in the decoupling only case, this always provides the most up-to-date frame to the view at display time, but it incurs all the computational (i.e. power) costs of rendering the frames that never get shown.

When the Model is Slow...

I haven't addressed cases where it's actually the update of the model based on user events that is too slow, because in a sense, if that's the case, in many ways, you no longer care about rendering. How can rendering possibly keep up if the model can't even keep up? Furthermore, assuming you find a way to interlock the rendering and the model computations, the rendering is always robbing cycles from the model computations which are, by definition, always behind. Put differently, you can't hope to render something N times per second when the something itself can't be updated N times per second.

I can conceive of cases where you might be able to offload something like a continuous running physics simulation to a background thread. Such a system would have to manage its real-time performance on its own, and assuming it does that, then you're stuck with the challenge of synchronizing the results from that system with the incoming user event stream. It's a mess.

In the common case, you really want the event handling and model mutation to be way faster than real-time, and have rendering be the "hard part." I struggle to envision a meaningful case where the model updating is the limiting factor, but yet you still care about decoupling rendering for performance.

Put differently: If your model can only update at 10Hz, it will never make sense to update your view faster than 10Hz. The principal challenge of that situation comes when user events are coming much faster than 10Hz. That challenge would be to meaningfully discard, sample or coalesce the incoming events so as to remain meaningful and provide a good user experience.

Some code

Here is a trivial example of how decoupled background rendering might look, based on the Cocoa Application template in Xcode. (I realized after coding up this OS X-based example, that the question was tagged with ios, so I guess this is "for whatever it's worth")

@class MyModel;

@interface NSAppDelegate : NSObject <NSApplicationDelegate>
@property (assign) IBOutlet NSWindow *window;
@property (nonatomic, readwrite, copy) MyModel* model;
@end

@interface MyModel : NSObject <NSMutableCopying>
@property (nonatomic, readonly, assign) CGPoint lastMouseLocation;
@end

@interface MyMutableModel : MyModel
@property (nonatomic, readwrite, assign) CGPoint lastMouseLocation;
@end

@interface MyBackgroundRenderingView : NSView
@property (nonatomic, readwrite, assign) CGPoint coordinates;
@end

@interface MyViewController : NSViewController
@end

@implementation NSAppDelegate
{
    MyViewController* _vc;
    NSTrackingArea* _trackingArea;
}

- (void)applicationDidFinishLaunching:(NSNotification *)aNotification
{
    // Insert code here to initialize your application
    self.window.acceptsMouseMovedEvents = YES;

    int opts = (NSTrackingActiveAlways | NSTrackingInVisibleRect | NSTrackingMouseMoved);
    _trackingArea = [[NSTrackingArea alloc] initWithRect: [self.window.contentView bounds]
                                                        options:opts
                                                          owner:self
                                                       userInfo:nil];
    [self.window.contentView addTrackingArea: _trackingArea];


    _vc = [[MyViewController alloc] initWithNibName: NSStringFromClass([MyViewController class]) bundle: [NSBundle mainBundle]];
    _vc.representedObject = self;

    _vc.view.frame = [self.window.contentView bounds];
    [self.window.contentView addSubview: _vc.view];
}

- (void)mouseEntered:(NSEvent *)theEvent
{
}

- (void)mouseExited:(NSEvent *)theEvent
{
}

- (void)mouseMoved:(NSEvent *)theEvent
{
    // Update the model for mouse movement.
    MyMutableModel* mutableModel = self.model.mutableCopy ?: [[MyMutableModel alloc] init];
    mutableModel.lastMouseLocation = theEvent.locationInWindow;
    self.model = mutableModel;
}

@end

@interface MyModel ()
// Re-declare privately so the setter exists for the mutable subclass to use
@property (nonatomic, readwrite, assign) CGPoint lastMouseLocation;
@end

@implementation MyModel

@synthesize lastMouseLocation;

- (id)copyWithZone:(NSZone *)zone
{
    if ([self isMemberOfClass: [MyModel class]])
    {
        return self;
    }

    MyModel* copy = [[MyModel alloc] init];
    copy.lastMouseLocation = self.lastMouseLocation;
    return copy;
}

- (id)mutableCopyWithZone:(NSZone *)zone
{
    MyMutableModel* copy = [[MyMutableModel alloc] init];
    copy.lastMouseLocation = self.lastMouseLocation;
    return copy;
}

@end

@implementation MyMutableModel
@end

@interface MyViewController (Downcast)
- (MyBackgroundRenderingView*)view; // downcast
@end

@implementation MyViewController

static void * const MyViewControllerKVOContext = (void*)&MyViewControllerKVOContext;

- (id)initWithNibName:(NSString *)nibNameOrNil bundle:(NSBundle *)nibBundleOrNil
{
    if (self = [super initWithNibName:nibNameOrNil bundle:nibBundleOrNil])
    {
        [self addObserver: self forKeyPath: @"representedObject.model.lastMouseLocation" options: NSKeyValueObservingOptionOld | NSKeyValueObservingOptionNew | NSKeyValueObservingOptionInitial context: MyViewControllerKVOContext];
    }
    return self;
}

- (void)dealloc
{
    [self removeObserver: self forKeyPath: @"representedObject.model.lastMouseLocation" context: MyViewControllerKVOContext];
}

- (void)observeValueForKeyPath:(NSString *)keyPath ofObject:(id)object change:(NSDictionary *)change context:(void *)context
{
    if (MyViewControllerKVOContext == context)
    {
        // update the view...
        NSValue* oldCoordinates = change[NSKeyValueChangeOldKey];
        oldCoordinates = [oldCoordinates isKindOfClass: [NSValue class]] ? oldCoordinates : nil;
        NSValue* newCoordinates = change[NSKeyValueChangeNewKey];
        newCoordinates = [newCoordinates isKindOfClass: [NSValue class]] ? newCoordinates : nil;
        CGPoint old = CGPointZero, new = CGPointZero;
        [oldCoordinates getValue: &old];
        [newCoordinates getValue: &new];

        if (!CGPointEqualToPoint(old, new))
        {
            self.view.coordinates = new;
        }
    }
    else
    {
        [super observeValueForKeyPath:keyPath ofObject:object change:change context:context];
    }
}

@end

@interface MyBackgroundRenderingView ()
@property (nonatomic, readwrite, retain) id toDisplay; // doesn't need to be atomic because it should only ever be used on the main thread.
@end

@implementation MyBackgroundRenderingView
{
    // Pointer sized reads/
    intptr_t _lastFrameStarted;
    intptr_t _lastFrameDisplayed;
    CGPoint _coordinates;
}

@synthesize coordinates = _coordinates;

- (void)setCoordinates:(CGPoint)coordinates
{
    _coordinates = coordinates;

    // instead of setNeedDisplay...
    [self doBackgroundRenderingForPoint: coordinates];
}

- (void)setNeedsDisplay:(BOOL)flag
{
    if (flag)
    {
        [self doBackgroundRenderingForPoint: self.coordinates];
    }
}

- (void)doBackgroundRenderingForPoint: (CGPoint)value
{
    NSAssert(NSThread.isMainThread, @"main thread only...");

    const intptr_t thisFrame = _lastFrameStarted++;
    const NSSize imageSize = self.bounds.size;
    const NSRect imageRect = NSMakeRect(0, 0, imageSize.width, imageSize.height);

    dispatch_async(dispatch_get_global_queue(0, 0), ^{

        // If another frame is already queued up, don't bother starting this one
        if (_lastFrameStarted - 1 > thisFrame)
        {
            dispatch_async(dispatch_get_global_queue(0, 0), ^{ NSLog(@"Not rendering a frame because there's a more recent one queued up already."); });
            return;
        }

        // introduce an arbitrary fake delay between 1ms and 1/15th of a second)
        const uint32_t delays = arc4random_uniform(65);
        for (NSUInteger i = 1; i < delays; i++)
        {
            // A later frame has been displayed. Give up on rendering this old frame.
            if (_lastFrameDisplayed > thisFrame)
            {
                dispatch_async(dispatch_get_global_queue(0, 0), ^{ NSLog(@"Aborting rendering a frame that wasn't ready in time"); });
                return;
            }
            usleep(1000);
        }

        // render image...
        NSImage* image = [[NSImage alloc] initWithSize: imageSize];
        [image lockFocus];
        NSString* coordsString = [NSString stringWithFormat: @"%g,%g", value.x, value.y];
        [coordsString drawInRect: imageRect withAttributes: nil];
        [image unlockFocus];

        NSArray* toDisplay = @[ image, @(thisFrame) ];
        dispatch_async(dispatch_get_main_queue(), ^{
            self.toDisplay = toDisplay;
            [super setNeedsDisplay: YES];
        });
    });
}

- (void)drawRect:(NSRect)dirtyRect
{
    NSArray* toDisplay = self.toDisplay;
    if (!toDisplay)
        return;
    NSImage* img = toDisplay[0];
    const int64_t frameOrdinal = [toDisplay[1] longLongValue];

    if (frameOrdinal < _lastFrameDisplayed)
        return;

    [img drawInRect: self.bounds];
    _lastFrameDisplayed = frameOrdinal;

    dispatch_async(dispatch_get_global_queue(0, 0), ^{ NSLog(@"Displayed a frame"); });
}

@end

Conclusion

In the abstract, just decoupling rendering from the main thread, but not necessarily parallelizing, (i.e. the first case) may be enough. To go further from there, you probably want to investigate ways to parallelize your per-frame render operation. Parallelizing the drawing of multiple frames confers some advantages, but in a battery powered environment like iOS it's likely to turn your app/game into a battery hog.

For any situation in which model updates, and not rendering, are the limiting reagent, the right approach is going to depend heavily on the specific details of the situation, and is much harder to generalize, compared to rendering.

dklt · Answer

My 2 cents worth.

GL game in my limited understanding always go update then render.

update cycle basically updates all in-game visually changing parts (ie: location/color/etc) to their next time-wise value. This can be done off a worker thread, maybe in your case, ahead of time, queued into a set of future t,t+1,t+2,t+n future values.

render cycle does the actual rendering inside the main-thread using the computed value above selectively (t, t+1, t+2, t+n). All rendering must be done inside the main-thread otherwise you will start to see perculiar artifacts. In rendering cycle, depending on the eclipsed time value, you can skip frames/fast-forward (ie: render t+1, t+4 values) or play in slow-motion (t+0.1, t+0.2).

Good luck with your studies!

How to safely decouple rendering from updating the model?

Tags:

ios

concurrency

grand-central-dispatch

game-engine

game-loop

Proud Member

2 Answers

Background: Single-Threaded (i.e. Main thread only) Example

Decoupling Rendering From the Main Thread

Utilizing Multiple Cores -- Inherently Parallel Rendering

Utilizing Multiple Cores -- Inherently Serial Rendering

When the Model is Slow...

Some code

Conclusion

ipmcc

dklt

Recent Activity

Donate For Us

How to safely decouple rendering from updating the model?

Tags:

ios

concurrency

grand-central-dispatch

game-engine

game-loop

Proud Member

2 Answers

Background: Single-Threaded (i.e. Main thread only) Example

Decoupling Rendering From the Main Thread

Utilizing Multiple Cores -- Inherently Parallel Rendering

Utilizing Multiple Cores -- Inherently Serial Rendering

When the Model is Slow...

Some code

Conclusion

ipmcc

dklt

Related questions

Recent Activity

Donate For Us