Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the difference between MOVDQA and MOVNTDQA, and VMOVDQA and VMOVNTDQ for WB/WC marked region?

What is the main difference between instructions through using memory marked as WB (write back) and WC (write combine): What is different between MOVDQA and MOVNTDQA, and what is different between VMOVDQA and VMOVNTDQ?

Is it right, that for the memory have marked as WC - instructions with [NT] is no different from usual (without [NT]), and that memory is marked WB - instructions with [NT] works with it as if it was a memory WC?

like image 228
Alex Avatar asked Sep 26 '13 18:09

Alex


1 Answers

Note : This answer discusses primarily NT stores. Peter's answer is more comprehensive.


You would typically use the NT (non temporal) instructions when writing to memory-mapped IO (ie: GPU, etc) where the memory is strictly uncacheable and is always accessed directly.

With regular reads and writes the CPU will try to cache and write out larger blocks to main memory when it needs to. With uncacheable regions (such as MMIO) the writes have to go directly to memory and the CPU will not try to cache them. Using the NT instruction hints to the CPU that you are probably streaming a large amount of data (ie: to a frame buffer, etc) and it will try to combine those writes when it can fill an entire cache-line.

The "non-temporal" part means that you're telling the CPU that you don't intend for the write to happen immediately but that it can be delayed, within reason, until enough NT instructions have been issued to fill the cache line.

As far as I understand, you can also use the NT instructions with regular write-back memory and it will not attempt to cache those writes but will also attempt to stream when it can fill a line. In the case of writing to WB memory I'd say the application would be pretty specialized and you would need to know that you could do a better job than the CPU at managing its cache. Also the write is not going to happen immediately so anything reading back afterwards would read stale data until the combined write was executed. You need to manage this with SFENCE instructions if you need to flush any outstanding combined writes.

like image 123
J... Avatar answered Sep 19 '22 08:09

J...