Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Should I try to use as many queues as possible?

Tags:

On my machine I have two queue families, one that supports everything and one that only supports transfer.

The queue family that supports everything has a queueCount of 16.

Now the spec states

Command buffers submitted to different queues may execute in parallel or even out of order with respect to one another

Does that mean I should try to use all available queues for maximal performance?

like image 681
Maik Klein Avatar asked Jun 01 '16 17:06

Maik Klein


People also ask

Why do we use multiple queues?

A smart multiple queue system can help provide customers with a more accurate estimate of wait times, since it takes into account which specific service they'll need. Customers are much happier to wait if a queuing system gives them realistic information about the wait time.

What is multiple queue?

A multiple-line queue works on the “more is better” understanding. Just like with single queues, there can be multiple service desks, but each comes with its own, separate line. With the multiple-line approach, the queues look visually shorter as there's more of them.


2 Answers

Yes, if you have workload that is highly independent use separate queues.

If the queues need a lot of synchronization between themselves, it may kill any potential benefit you may get.

Basically what you are doing is supplying GPU with some alternative work it can do (and fill stalls and bubbles and idles with and giving GPU the choice) in the case of same queue family. And there is some potential to better use CPU (e.g. singlethreaded vs one queue per thread).

Using separate transfer queues (or other specialized family) seem to be the recommended approach even.

That is generally speaking. More realistic, empirical, sceptical and practical view was already presented by SW and NB answers. In reality one does have to be bit more cautious as those queues target the same resources, have same limits, and other common restrictions, limiting potential benefits gained from this. Notably, if the driver does the wrong thing with multiple queues, it may be very very bad for cache.

This AMD's Leveraging asynchronous queues for concurrent execution(2016) discusses a bit how it maps to their HW\driver. It shows potential benefits of using separate queue families. It says that although they offer two queues of compute family, they did not observe benefits in apps at that time. They say they have only one graphics queue, and why.

NVIDIA seems to have a similar idea of "asynch compute". Shown in Moving to Vulkan: Asynchronous compute.

To be safe, it seems we should still stick with only one graphics, and one async compute queue though on current HW. 16 queues seem like a trap and a way to hurt yourself.

With transfer queues it is not as simple as it seems either. You should use the dedicated ones for Host->Device transfers. And the non-dedicated should be used for device->device transfer ops.

like image 96
krOoze Avatar answered Nov 09 '22 17:11

krOoze


To what end?

Take the typical structure of a deferred renderer. You build your g-buffers, do your lighting passes, do some post-processing and tone mapping, maybe throw in some transparent stuff, and then present the final image. Each process depends on the previous process having completed before it can begin. You can't do your lighting passes until you've finished your g-buffer. And so forth.

How could you parallelize that across multiple queues of execution? You can't parallelize the g-buffer building or the lighting passes, since all of those commands are writing to the same attached images (and you can't do that from multiple queues). And if they're not writing to the same images, then you're going to have to pick a queue in which to combine the resulting images into the final one. Also, I have no idea how depth buffering would work without using the same depth buffer.

And that combination step would require synchronization.

Now, there are many tasks which can be parallelized. Doing frustum culling. Particle system updates. Memory transfers. Things like that; data which is intended for the next frame. But how many queues could you realistically keep busy at once? 3? Maybe 4?

Not to mention, you're going to need to build a rendering system which can scale. Vulkan does not require that implementations provide more than 1 queue. So your code needs to be able to run reasonably on a system that only offers one queue as well as a system that offers 16. And to take advantage of a 16 queue system, you might need to render very differently.

Oh, and be advised that if you ask for a bunch of queues, but don't use them, performance could be impacted. If you ask for 8 queues, the implementation has no choice but to assume that you intend to be able to issue 8 concurrent sets of commands. Which means that the hardware cannot dedicate all of its resources to a single queue. So if you only ever use 3 of them... you may be losing over 50% of your potential performance to resources that the implementation is waiting for you to use.

Granted, the implementation could scale such things dynamically. But unless you profile this particular case, you'll never know. Oh, and if it does scale dynamically... then you won't be gaining a whole lot from using multiple queues like this either.

Lastly, there has been some research into how effective multiple queue submissions can be at keeping the GPU fed, on several platforms (read all of the parts). The general long and short of it seems to be that:

  1. Having multiple queues executing genuine rendering operations isn't helpful.
  2. Having a single rendering queue with one or more compute queues (either as actual compute queues or graphics queues you submit compute work to) is useful at keeping execution units well saturated during rendering operations.
like image 25
Nicol Bolas Avatar answered Nov 09 '22 16:11

Nicol Bolas