Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

QML Performance issue when updating an item in presence of many non-overlapping items

In the following QML, the only dynamic part is the blinking rectangle. While it has no relation to the generated items, the blinking rectangle causes a heavy load and slows down the system (e.g. 100% CPU load on i.MX6 processor I am using), even when there is no overlap/binding between it and other items. Removing the Repeater solves the issue and rectangle smoothly blinks.

import QtQuick 2.3

Rectangle  {
    id: root
    anchors.fill: parent

    Repeater {
        model: 10000
        delegate: Rectangle {
            width: 5
            height: 5
            x: (index % 200)*6
            y: 50 + Math.floor(index / 200)*6
            color: "blue"
            border.color: "black"
        }
    }

    Rectangle {
        property bool blinker: false
        width: 20
        height: 20
        color: blinker ? "green" : "red"

        Timer {
            running: true
            interval: 100
            repeat: true
            onTriggered: { parent.blinker = !parent.blinker }
        }
    }
}

Here is the output (the red rectangle will blink in the actual application): enter image description here

The model: 10000 parameter of Repeater may need to be set to a higher value in the case you have a better specification and don't experience slow down. The code is tested on Qt 5.3.2 and Qt 5.5.0 and the problem was present in both.

I have a fewer number of models (~100) in my actual application, but with more complex delegate. Therefore, the CPU (GPU?) usage depends on the complexity of the delegate + number of model items in the Repeater.

Why having a high number of items (or complex items) generated by Repeater affect the performance of application while they have no relation/overlap with the other dynamic object(s)?

Update 1

I've replaced Repeater with the following javascript code to generate the same number of objects with the same properties:

Component.onCompleted: {
    var objstr = 'import QtQuick 2.0;Rectangle{id:sample;width:5; height:5;color:"blue";border.color: "black"}';
    for(var i=0;i<200;i++) {
        for(var j=0;j<50;j++) {
            var obj = Qt.createQmlObject(objstr,root);
            obj.x = i * 6
            obj.y = 50 + j*6
        }
    }
}

But there performance issue was still present.

Update 2

I've done some examinations based on this article.

QSG_RENDERER_DEBUG=render

Setting this flag outputs some debugging information about rendering and batching. The output for the test application

isaac@ubuntu:~$ QSG_RENDERER_DEBUG=render ./qml-test 
QML debugging is enabled. Only use this in a safe environment.
Batch thresholds: nodes: 64  vertices: 1024
Using buffer strategy: static
Renderer::render() QSGAbstractRenderer(0x93b9570) "rebuild: full"
Rendering:
 -> Opaque: 14002 nodes in 2 batches...
 -> Alpha: 0 nodes in 0 batches...
 - 0x8f0a698 [  upload] [  clip] [opaque] [  merged]  Nodes: 14000  Vertices: 168000  Indices: 224000  root: 0xb3e2a90 sets: 3
 - 0x8f0b310 [  upload] [noclip] [opaque] [  merged]  Nodes:    2  Vertices:     8  Indices:    12  root: 0x0
Renderer::render() QSGAbstractRenderer(0x93b9570) "rebuild: none"
Rendering:
 -> Opaque: 14002 nodes in 2 batches...
 -> Alpha: 0 nodes in 0 batches...
 - 0x8f0a698 [retained] [  clip] [opaque] [  merged]  Nodes: 14000  Vertices: 168000  Indices: 224000  root: 0xb3e2a90 sets: 3
 - 0x8f0b310 [retained] [noclip] [opaque] [  merged]  Nodes:    2  Vertices:     8  Indices:    12  root: 0x0
Renderer::render() QSGAbstractRenderer(0x93b9570) "rebuild: none"

This tells that items are batched in 2 group; one with 14000 nodes and one with 2 nodes. This seems to be what we expect.

QSG_VISUALIZE=batches flag

This switch visualizes the batches on the UI. Running this shows a solid color covering the whole UI. This means the blinking rectangle and the small rectangles are being rendered in one batch:

enter image description here

Setting clip: true didn't help to force separating the batches. By setting opacity: 0.5 for blinking rectangle, I finally succeeded to force QML engine to put it into another batch:

enter image description here

Interestingly, the blinking was still affected and slowed down by the high number of small rectangles!

QSG_RENDER_TIMING=1

The last flag I tried was QSG_RENDER_TIMING which report some timing information for rendering. Based on the output, the actual time spent is for render in the render loop. Based on the Qt documentation, render time is

Total time spent rendering the frame, including preparing and uploading all the necessary data to the GPU. This is the gross render time. Do not confuse it with the net Render Render time below.

but this wasn't helpful to me. So far, I haven't be able to find the root cause of this issue.

isaac@ubuntu:~$ QSG_RENDER_TIMING=1 ./qml-test 
QML debugging is enabled. Only use this in a safe environment.
qt.scenegraph.time.compilation: shader compiled in 3ms
qt.scenegraph.time.renderer: time in renderer: total=27ms, preprocess=0, updates=5, binding=0, rendering=21
qt.scenegraph.time.renderloop: Frame rendered with 'basic' renderloop in 107ms, polish=0, sync=65, render=27, swap=1, frameDelta=0
qt.scenegraph.time.renderer: time in renderer: total=1ms, preprocess=0, updates=0, binding=0, rendering=1
qt.scenegraph.time.renderloop: Frame rendered with 'basic' renderloop in 1ms, polish=0, sync=0, render=1, swap=0, frameDelta=2
qt.scenegraph.time.renderer: time in renderer: total=8ms, preprocess=0, updates=0, binding=0, rendering=8
qt.scenegraph.time.renderloop: Frame rendered with 'basic' renderloop in 255ms, polish=0, sync=0, render=8, swap=24, frameDelta=255
qt.scenegraph.time.renderer: time in renderer: total=1ms, preprocess=0, updates=0, binding=0, rendering=1
qt.scenegraph.time.renderloop: Frame rendered with 'basic' renderloop in 290ms, polish=0, sync=0, render=1, swap=28, frameDelta=297
qt.scenegraph.time.renderer: time in renderer: total=0ms, preprocess=0, updates=0, binding=0, rendering=0
qt.scenegraph.time.renderloop: Frame rendered with 'basic' renderloop in 296ms, polish=0, sync=0, render=0, swap=29, frameDelta=303
qt.scenegraph.time.renderer: time in renderer: total=298ms, preprocess=0, updates=0, binding=0, rendering=298
qt.scenegraph.time.renderloop: Frame rendered with 'basic' renderloop in 300ms, polish=0, sync=0, render=298, swap=0, frameDelta=306
qt.scenegraph.time.renderer: time in renderer: total=592ms, preprocess=0, updates=0, binding=0, rendering=592
qt.scenegraph.time.renderloop: Frame rendered with 'basic' renderloop in 593ms, polish=0, sync=0, render=592, swap=0, frameDelta=600
qt.scenegraph.time.renderer: time in renderer: total=292ms, preprocess=0, updates=0, binding=0, rendering=292
qt.scenegraph.time.renderloop: Frame rendered with 'basic' renderloop in 298ms, polish=0, sync=0, render=295, swap=0, frameDelta=305
qt.scenegraph.time.renderer: time in renderer: total=286ms, preprocess=0, updates=0, binding=0, rendering=286
qt.scenegraph.time.renderloop: Frame rendered with 'basic' renderloop in 291ms, polish=0, sync=0, render=286, swap=0, frameDelta=298
qt.scenegraph.time.renderer: time in renderer: total=291ms, preprocess=0, updates=0, binding=0, rendering=291
qt.scenegraph.time.renderloop: Frame rendered with 'basic' renderloop in 296ms, polish=0, sync=0, render=294, swap=0, frameDelta=305
qt.scenegraph.time.renderer: time in renderer: total=286ms, preprocess=0, updates=0, binding=0, rendering=286
qt.scenegraph.time.renderloop: Frame rendered with 'basic' renderloop in 292ms, polish=0, sync=0, render=286, swap=0, frameDelta=298
qt.scenegraph.time.renderer: time in renderer: total=290ms, preprocess=0, updates=0, binding=0, rendering=290
qt.scenegraph.time.renderloop: Frame rendered with 'basic' renderloop in 295ms, polish=0, sync=0, render=291, swap=0, frameDelta=301
qt.scenegraph.time.renderer: time in renderer: total=297ms, preprocess=0, updates=0, binding=0, rendering=297
qt.scenegraph.time.renderloop: Frame rendered with 'basic' renderloop in 302ms, polish=0, sync=0, render=298, swap=0, frameDelta=310
qt.scenegraph.time.renderer: time in renderer: total=290ms, preprocess=0, updates=0, binding=0, rendering=290
qt.scenegraph.time.renderloop: Frame rendered with 'basic' renderloop in 293ms, polish=0, sync=0, render=290, swap=0, frameDelta=316
like image 764
Isaac Avatar asked Dec 21 '15 16:12

Isaac


1 Answers

This is an old question, but it looks like there's no real resolution here, so I'll do my best to chime in with some useful bits and pieces.

So, you're definitely partly on the right track with looking at batching, great start. I imagine the reason you didn't see any effect from setting clip: true was that you may have been setting it on the wrong place -- you need to either set it on the bottom Rectangle (containing the Timer), or you need to contain the Repeater in something else that you can clip, like:

Item {
    anchors.fill: parent
    clip: true
    Repeater {
        ...
    }
}

This is because, while the Repeater inherits the Item type, it is a bit of a special item. The children it creates are parented to the parent of the repeater, not the repeater itself, so the repeater would have clipping set – but no visual children to apply that clipping to in your case.

The ideal solution here would be to set clip: true both on something containing the Repeater (as done above), and on the bottom Rectangle to ensure that neither of the two subtrees affect the performance of the other.

However, you note that this didn't directly solve your problem, so let's move on from batching to other things.

A quick observation: I notice that you are using the 'basic' renderloop instead of the 'threaded' one. Is there a reason for this? It won't buy you much with the example you have here (as you don't have many bindings evaluating and no other application to speak of), but in a real world case, it should be quite a bit better, so I would recommend trying to use it if at all possible.

Once you get past that, you need to know that the QtQuick scenegraph expects to run with a blocking vsync. Animations and everything else all tie in to the vsync of your display. When you're working at this level, you need to know how your graphics setup works, and pay special care to make sure that you are enabling that to happen.

So, now let's talk about the hardware side of the picture. I don't know precisely what your setup is on imx6, but I'm assuming you're using Linux & Vivante drivers on fbdev, and the eglfs QPA plugin from Qt. First things first, you should play around with the FB_MULTI_BUFFER environment variable to ensure you are tied to the vsync of the display (i.e. you probably want FB_MULTI_BUFFER=2 or FB_MULTI_BUFFER=3). I don't know if this is now set automatically, but it wasn't when I last had to work on such a system.

Assuming you are using fbdev, the mechanism for waiting on the display is an ioctl. You want to look at your display driver in the kernel, and see if it's respecting the FBIO_WAITFORVSYNC ioctl, and compile Qt to use that (grep qtbase for FBIO_WAITFORVSYNC – it should be somewhere in the eglfs platform plugin). You'll also note that it's "hidden" behind an environment variable: QT_QPA_EGLFS_FORCEVSYNC, so you'll want to export QT_QPA_EGLFS_FORCEVSYNC=1 once you have ensured it's built to issue that ioctl. While you're at it, you should check that the FBIOGET_VSCREENINFO ioctl is returning useful and correct information, as eglfs will use the returned information from that to determine the refresh rate of the display (see q_refreshRateFromFb in the eglfs plugin).

After all that, things may improve for you. If they don't, I can say that on a similar setup, I've run into cases before where there was no ability to force-throttle rendering (where FBIO_WAITFORVSYNC was effectively unusable), which means that you're left to doing this yourself. I don't know how universal this problem is, but it may well apply to you, so:

If you are in such a situation, you can tweak the QT_QPA_UPDATE_IDLE_TIME=x environment variable to tell Qt to wait for a minimum duration of at least x ms before drawing another frame, for instance, export QT_QPA_UPDATE_IDLE_TIME=32 would wait 32ms between frames at a minimum, giving you roughly 30 FPS. You should treat this with some caution though, as it is far from an ideal scenario, and it's not really what I would call a widely "supported" thing.

like image 188
Robin Burchell Avatar answered Nov 01 '22 13:11

Robin Burchell