Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Transferring memory from GPU to CPU with Vulkan and vkInvalidateMappedMemoryRanges synchronization?

Tags:

vulkan

In Vulkan, when I want to transfer some memory the GPU back to the CPU, I think the most efficient way to do this is to write the data into memory which has the flags VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT | VK_MEMORY_PROPERTY_HOST_CACHED_BIT.

Question #1: Is that assumption correct?

(Full list of available memory property flags can be found in Vulkan's documentation of VkMemoryPropertyFlagBits)

In order to get the latest data, I have to invalidate the memory using vkInvalidateMappedMemoryRanges, right?

Question #2: What is happening under the hood during vkInvalidateMappedMemoryRanges? Is this just a memcpy from some internal cache or can this be a longer procedure?

Question #3: If this could take longer (i.e. it is not a simple memcpy), then I probably should have some possibility to synchronize with the completion of it, right? However, vkInvalidateMappedMemoryRanges does not offer any synchronization parameters. Actually, my question is: IF I have to synchronize it, HOW do I synchronize it?

like image 296
j00hi Avatar asked Jun 14 '19 08:06

j00hi


People also ask

When to use vkinvalidatemappedmemoryranges()?

For this type of memory you must use vkInvalidateMappedMemoryRanges () after GPU writes and before CPU reads (or vkFlushMappedRange () for the other direction) to ensure that one processor can see what the other wrote, or you might read stale data.

How do I send data to a GPU only memory?

To send data into GPU only memory, you need to first copy your data into a CPU writeable buffer, encode a copy command in a VkCommandBuffer, and then submit that command buffer to a queue. This will enqueue the transfer of the memory from your CPU writeable buffer into another buffer, which can be a GPU allocated buffer.

Why can't I write to GPU only memory?

The problem is that you cant write to it directly from the CPU. To send data into GPU only memory, you need to first copy your data into a CPU writeable buffer, encode a copy command in a VkCommandBuffer, and then submit that command buffer to a queue.

What is the maximum amount of memory available in Vulkan?

Also note memory sizes reported in Vulkan represent the maximum amount which is shared across applications and driver. Using 256 MB per vkAllocateMemory () allocation is a good starting point for collections of buffers and images No ability to use vkMapMemory () to map into Host system address space


1 Answers

Question #1: Is that assumption correct?

Probably not, but it depends on your platform whether you support the alternative. For GPU->CPU transfers there are really three options:

1. HOST_VISIBLE

This type is visible to the host and guaranteed to be coherent, but not cached on the host. CPU reads will be very slow but that might be OK if you are only reading back a small amount of data (and might be cheaper than issuing vkInvalidateMappedMemoryRanges(), and there is little point streaming data into the CPU cache if you never expect to touch it again on the CPU).

2. HOST_VISIBLE | HOST_CACHED

This type is visible to the host and cached, but not guaranteed to be coherent (CPU and GPU might see different things at the same address if you don't manually enforce coherency). For this type of memory you must use vkInvalidateMappedMemoryRanges() after GPU writes and before CPU reads (or vkFlushMappedRange() for the other direction) to ensure that one processor can see what the other wrote, or you might read stale data.

Data access will be fast once in the cache, and you can benefit from CPU-side data fetch tricks such as explcit preloads and cache prefetching, but you will pay an overhead for the invalidate operation.

3. HOST_VISIBLE | HOST_CACHED | HOST_COHERENT

Finally you have the host cached AND coherent memory type, which sort of gives you best of both if you have high bandwidth reads on the CPU to make. Hardware provides the coherency implementation automatically, so no need to invalidate, BUT it's not guaranteed to be available on all platforms. For bulk data reads on the CPU I would expect this to be the most efficient in cases where it is available.

It's worth noting that there is no "best" memory settings for all allocations. Do not use host cached or host coherent memory for things you never expect to transfer back to the CPU (memory coherency isn't free in terms of power or memory performance).

Question #2: What is happening under the hood during vkInvalidateMappedMemoryRanges? Is this just a memcpy from some internal cache or can this be a longer procedure?

In the case where you have non-coherent memory then it does whatever is needed to make them coherent. Typically this means invalidating (discarding) cache lines in CPU cache which may contain stale copies of the data, ensuring that subsequent reads by the CPU see the version that the GPU actually wrote.

Question #3: If this could take longer (i.e. it is not a simple memcpy), then I probably should have some possibility to synchronize with the completion of it, right?

No. Invalidation is a CPU-side operation, so it takes CPU time to complete and the CPU is busy while the operation is completing. In general you can avoid the need to do it at all by using coherent memory though.

like image 164
solidpixel Avatar answered Oct 11 '22 18:10

solidpixel