In Vulkan, you can use vkCreateGraphicsPipeline
or vkCreateComputePipeline
to create pipeline derivates, with the basePipelineHandle
or basePipelineIndex
members of VkGraphicsPipelineCreateInfo
/VkComputePipelineCreateInfo
. The documentation states that this feature is available for performance reasons:
The goal of derivative pipelines is that they be cheaper to create using the parent as a starting point, and that it be more efficient (on either host or device) to switch/bind between children of the same parent.
This raises quite a few questions for me:
VK_PIPELINE_CREATE_ALLOW_DERIVATIVES_BIT
. Is there a downside to always using this flag (eg. in case you may create a derived pipeline from this one in the future)?I came to this question investigating whether pipeline derivatives provide a benefit. Here's some resources I found from vendors:
Tips and Tricks: Vulkan Dos and Don’ts, Nvidia, June 6, 2019
Don’t expect speedup from Pipeline Derivatives.
Vulkan Usage Recommendations, Samsung
Pipeline derivatives let applications express "child" pipelines as incremental state changes from a similar "parent"; on some architectures, this can reduce the cost of switching between similar states. Many mobile GPUs gain performance primarily through pipeline caches, so pipeline derivatives often provide no benefit to portable mobile applications.
Recommendations
- Create pipelines early in application execution. Avoid pipeline creation at draw time.
- Use a single pipeline cache for all pipeline creation.
- Write the pipeline cache to a file between application runs.
- Avoid pipeline derivatives.
Vulkan Best Practice for Mobile Developers - Pipeline Management, Arm Software, Jul 11, 2019
Don't
- Create pipelines at draw time without a pipeline cache (introduces performance stutters).
- Use pipeline derivatives as they are not supported.
Vulkan Samples, LunarG, API-Samples/pipeline_derivative/pipeline_derivative.cpp
/*
VULKAN_SAMPLE_SHORT_DESCRIPTION
This sample creates pipeline derivative and draws with it.
Pipeline derivatives should allow for faster creation of pipelines.
In this sample, we'll create the default pipeline, but then modify
it slightly and create a derivative. The derivatve will be used to
render a simple cube.
We may later find that the pipeline is too simple to show any speedup,
or that replacing the fragment shader is too expensive, so this sample
can be updated then.
*/
It doesn't look like any vendor is actually recommending the use of pipeline derivatives, except maybe to speed up pipeline creation.
To me, that seems like a good idea in theory on a theoretical implementation that doesn't amount to much in practice.
Also, if the driver is supposed to benefit from a common parent of multiple pipelines, it should be completely able to automate that ancestor detection. "Common ancestors" could be synthesized based on whichever specific common pipeline states provide the best speed-up. Why specify it explicitly through the API?
Is there a way to indicate which state is shared between parent and child pipelines
No; the pipeline creation API provides no way to tell it what state will change. The idea being that, since the implementation can see the parent's state, and it can see what you ask of the child's state, it can tell what's different.
Also, if there were such a way, it would only represent a way for you to accidentally misinform the implementation as to what changed. Better to just let the implementation figure out the changes.
Is there any way to know whether the implementation is actually getting any benefit from using derived pipelines (other than profiling)?
No.
The parent pipeline needs to be created with
VK_PIPELINE_CREATE_ALLOW_DERIVATIVES_BIT
. Is there a downside to always using this flag (eg. in case you may create a derived pipeline from this one in the future)?
Probably. Due to #1, the implementation needs to store at least some form of the parent pipeline's state, so that it can compare it to the child pipeline's state. And it must store this state in an easily readable form, which will probably not be the same form as the GPU memory and tokens to be copied into the command stream. As such, there's a good chance that parent pipelines will allocate additional memory for such data. Though the likelihood of them being slower at binding/command execution time is low.
You can test this easily enough by passing an allocator to the pipeline creation functions. If it allocates the same amount of memory as without the flag, then it probably isn't storing anything.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With