I was following the vulkan tutorial at https://vulkan-tutorial.com/ and at the depth buffering chapter, the author Alexander Overvoorde mentions that "We only need a single depth image, because only one draw operation is running at once." This is where my issue comes in.
I've read many SO questions and articles/blog posts on Vulkan synchronization in the past days, but I can't seem to reach a conclusion. The information that I've gathered so far is the following:
Draw calls in the same subpass execute on the gpu as if they were in order, but only if they draw to the framebuffer (I can't recall exectly where I read this, it might have been a tech talk on youtube, so I am not 100% sure about this). As far as I understood, this is more GPU hardware behavior than it is Vulkan behaviour, so this would essentially mean that the above is true in general (including across subpasses and even render passes) - which would answer my question, but I can't find any clear information on this.
The closest I've gotten to getting my question answered is this reddit comment that the OP seemed to accept, but the justification is based on 2 things:
"there is a queue flush at the high level that ensures previously submitted render passes are finished"
"the render passes themselves describe what attachments they read from and write to as external dependencies"
I see neither any high level queue flush (unless there is some sort of explicit one that I cannot find for the life of me in the specification), nor where the render pass describes dependencies on its attachments - it describes the attachments, but not the dependencies (at least not explicitly). I have read the relevant chapters of the specification multiple times, but I feel like the language is not clear enough for a beginner to fully grasp.
I would also really appreciate Vulkan specification quotes where possible.
Edit: to clarify, the final question is: What synchronization mechanism guarantees that the draw call in the next command buffer is not submitted until the current draw call is finished?
I'm afraid, I have to say that the Vulkan Tutorial is wrong. In its current state, it can not be guaranteed that there are no memory hazards when using only one single depth buffer. However, it would require only a very small change so that only one depth buffer would be sufficient.
Let's analyze the relevant steps of the code that are performed within drawFrame
.
We have two different queues: presentQueue
and graphicsQueue
, and MAX_FRAMES_IN_FLIGHT
concurrent frames. I refer to the "in flight index" with cf
(which stands for currentFrame = (currentFrame + 1) % MAX_FRAMES_IN_FLIGHT
). I am using sem1
and sem2
to represent the different arrays of semaphores and fence
for the array of fences.
The relevant steps in pseudocode are the following:
vkWaitForFences(..., fence[cf], ...);
vkAcquireNextImageKHR(..., /* signal when done: */ sem1[cf], ...);
vkResetFences(..., fence[cf]);
vkQueueSubmit(graphicsQueue, ...
/* wait for: */ sem1[cf], /* wait stage: *, COLOR_ATTACHMENT_OUTPUT ...
vkCmdBeginRenderPass(cb[cf], ...);
Subpass Dependency between EXTERNAL -> 0:
srcStages = COLOR_ATTACHMENT_OUTPUT,
srcAccess = 0,
dstStages = COLOR_ATTACHMENT_OUTPUT,
dstAccess = COLOR_ATTACHMENT_WRITE
...
vkCmdDrawIndexed(cb[cf], ...);
(Implicit!) Subpass Dependency between 0 -> EXTERNAL:
srcStages = ALL_COMMANDS,
srcAccess = COLOR_ATTACHMENT_WRITE|DEPTH_STENCIL_WRITE,
dstStages = BOTTOM_OF_PIPE,
dstAccess = 0
vkCmdEndRenderPass(cb[cf]);
/* signal when done: */ sem2[cf], ...
/* signal when done: */ fence[cf]
);
vkQueuePresent(presentQueue, ... /* wait for: */ sem2[cf], ...);
The draw calls are performed on one single queue: the graphicsQueue
. We must check if commands on that graphicsQueue
could theoretically overlap.
Let us consider the events that are happening on the graphicsQueue
in chronological order for the first two frames:
img[0] -> sem1[0] signal -> t|...|ef|fs|lf|co|b -> sem2[0] signal, fence[0] signal
img[1] -> sem1[1] signal -> t|...|ef|fs|lf|co|b -> sem2[1] signal, fence[1] signal
where t|...|ef|fs|lf|co|b
stands for the different pipeline stages, a draw call passes through:
t
... TOP_OF_PIPE
ef
... EARLY_FRAGMENT_TESTS
fs
... FRAGMENT_SHADER
lf
... LATE_FRAGMENT_TESTS
co
... COLOR_ATTACHMENT_OUTPUT
b
... BOTTOM_OF_PIPE
While there might be an implicit dependency between sem2[i] signal -> present
and sem1[i+1]
, this only applies when the swap chain provides only one image (or if it would always provide the same image). In the general case, this can not be assumed. That means, there is nothing which would delay the immediate progression of the subsequent frame after the first frame is handed over to present
. The fences also do not help because after fence[i] signal
, the code waits on fence[i+1]
, i.e. that also does not prevent progression of subsequent frames in the general case.
What I mean by all of that: The second frame starts rendering concurrently to the first frame and there is nothing that prevents it from accessing the depth buffer concurrently as far as I can tell.
The Fix:
If we wanted to use only a single depth buffer, though, we can fix the tutorial's code: What we want to achieve is that the ef
and lf
stages wait for the previous draw call to complete before resuming. I.e. we want to create the following scenario:
img[0] -> sem1[0] signal -> t|...|ef|fs|lf|co|b -> sem2[0] signal, fence[0] signal
img[1] -> sem1[1] signal -> t|...|________|ef|fs|lf|co|b -> sem2[1] signal, fence[1] signal
where _
indicates a wait operation.
In order to achieve this, we would have to add a barrier that prevents subsequent frames performing the EARLY_FRAGMENT_TEST
and LATE_FRAGMENT_TEST
stages at the same time. There is only one queue where the draw calls are performed, so only the commands in the graphicsQueue
require a barrier. The "barrier" can be established by using the subpass dependencies:
vkWaitForFences(..., fence[cf], ...);
vkAcquireNextImageKHR(..., /* signal when done: */ sem1[cf], ...);
vkResetFences(..., fence[cf]);
vkQueueSubmit(graphicsQueue, ...
/* wait for: */ sem1[cf], /* wait stage: *, EARLY_FRAGMENT_TEST...
vkCmdBeginRenderPass(cb[cf], ...);
Subpass Dependency between EXTERNAL -> 0:
srcStages = EARLY_FRAGMENT_TEST|LATE_FRAGMENT_TEST,
srcAccess = DEPTH_STENCIL_ATTACHMENT_WRITE,
dstStages = EARLY_FRAGMENT_TEST|LATE_FRAGMENT_TEST,
dstAccess = DEPTH_STENCIL_ATTACHMENT_WRITE|DEPTH_STENCIL_ATTACHMENT_READ
...
vkCmdDrawIndexed(cb[cf], ...);
(Implicit!) Subpass Dependency between 0 -> EXTERNAL:
srcStages = ALL_COMMANDS,
srcAccess = COLOR_ATTACHMENT_WRITE|DEPTH_STENCIL_WRITE,
dstStages = BOTTOM_OF_PIPE,
dstAccess = 0
vkCmdEndRenderPass(cb[cf]);
/* signal when done: */ sem2[cf], ...
/* signal when done: */ fence[cf]
);
vkQueuePresent(presentQueue, ... /* wait for: */ sem2[cf], ...);
This should establish a proper barrier on the graphicsQueue
between the draw calls of the different frames. Because it is an EXTERNAL -> 0
-type subpass dependency, we can be sure that renderpass-external commands are synchronized (i.e. sync with the previous frame).
Update: Also the wait stage for sem1[cf]
has to be changed from COLOR_ATTACHMENT_OUTPUT
to EARLY_FRAGMENT_TEST
. This is because layout transitions happen at vkCmdBeginRenderPass
time: after the first synchronization scope (srcStages
and srcAccess
) and before the second synchronization scope (dstStages
and dstAccess
). Therefore, the swapchain image must be available there already so that the layout transition happens at the right point in time.
No, rasterization order does not (per specification) extend outside a single subpass. If multiple subpasses write to the same depth buffer, then there should be a VkSubpassDependency
between them. If something outside a render pass writes to the depth buffer, then there should also be explicit synchronization (via barriers, semaphores, or fences).
FWIW I think the vulkan-tutorial sample is non-conformant. At least I do not see anything that would prevent a memory hazard on the depth buffer. It seems that the depth buffer should be duplicated to MAX_FRAMES_IN_FLIGHT
, or explicitly synchronized.
The sneaky part about undefined behavior is that wrong code often works correctly. Unfortunately making sync proofs in the validation layers is little bit tricky, so for now only thing that remains is to simply be careful.
Futureproofing the answer:
What I do see is conventional WSI semaphore chain (used with vkAnquireNextImageKHR
and vkQueuePresentKHR
) with imageAvailable
and renderFinished
semaphores. There is only one subpass dependency with VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT
, that is chained to the imageAvailable
semaphore. Then there are fences with MAX_FRAMES_IN_FLIGHT == 2
, and fences guarding the individual swapchain images. Meaning two subsequent frames should run unimpeded wrt each other (except in the rare case they acquire the same swapchain image). So, the depth buffer seems to be unprotected between two frames.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With