Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How _mm_prefetch works?

The _mm_prefetch call as stated here prefetches the contents from a given memory location in RAM to cache line. But the cache is completely under the hardware control right? Based on which memory (based on spatial/temporal locations) is accessed a lot, the hardware prefetched the contents from memory to cache. I thought that programmers have no control over cache and it is completely a hardware mechanism.

So my understanding is wrong and cache can actually be controlled by us, right?

If _mm_prefetch can control what can be put inside cache,

  1. does that mean it will never be removed from cache when?

  2. what is the equivalent assembly level instruction which works on cache mechanisms?

like image 825
Jsmith Avatar asked Dec 19 '22 10:12

Jsmith


1 Answers

We can always move data into the cache, if active, by simply performing a memory access.
We can prefetch a var by simply "touching" it ahead of time, we don't need a special instruction for that.

It's unclear what you mean by "control over the cache" as we can enable/disable it, set its mode, its fill/spill policy and sharing mode with other HW threads.
We can also fill the cache with data and by clever use of arithmetic force the eviction of a line.

Your assumption that programmers have to control whatsoever over the cache is then not entirely valid, though not incorrect: the CPU is free to implement any cache policy it wants as long as it respects the documented specification (including having no cache at all or spilling the cache every X clock ticks).
One thing we cannot do, yet, is to pin lines in the cache, we cannot tell the CPU to never evict a specific line.

EDIT As @Mysticial pointed out in the comments, it is possible to pin data into the L3 cache in newer Intel CPUs.


The PREFETCHT0, PREFETCHT1, PREFETCHT2, PREFETCHTNTA and PREFETCHWT1 instructions to which _mm_prefetch is compiled to are just a hint for the hardware prefetchers if present, active, and willing to respect the hint1.

Their limited use cases3 come more from the finer control over the cache hierarchy level the data will stop in and the reduced use of the core resources2 rather than as way to move the data into the cache.

Once a line has been prefetched it is removed from the cache as any other line would.


1 These hardware prefetchers are usually triggered by memory access patterns (like sequential accesses) and are asynchronous with respect to the execution flow.

2 They are asynchronous by nature (the quickly complete locally) and may not pollute the core resources a load would (e.g. a register, the load unit and so on).

3 While one may think that a hint is at worst useless (if not respected) it can actually turns out that prefetch degrates the performance.

like image 120
Margaret Bloom Avatar answered Jan 01 '23 14:01

Margaret Bloom