Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Prefetching data to cache for x86-64

In my application, at one point I need to perform calculations on a large contiguous block of memory data (100s of MBs). What I was thinking was to keep prefetching the part of the block my program will touch in future, so that when I perform calculations on that portion, the data is already in the cache.

Can someone give me a simple example of how to achieve this with gcc? I read _mm_prefetch somewhere, but don't know how to properly use it. Also note that I have a multicore system, but each core will be working on a different region of memory in parallel.

like image 735
pythonic Avatar asked Apr 25 '12 20:04

pythonic


People also ask

What is the difference between cache and prefetch?

Proactively prefetching data brings the data into the cache before the actual requests occur. Passively caching data, on the other hand, only fetches the missed data from the backend storage after the requests arrive. There is a trade-off between prefetching and caching.

What is L2 cache prefetching?

The last-level (L2) caches contain hardware stream prefetchers that are trained on streams of misses and software prefetches. If a hardware prefetcher detects a pattern in the misses it sees, it will begin prefetching future addresses in that pattern.

How do you use prefetch instructions?

You want to prefetch once per 64B cache line, and you'll need to tune how far ahead to prefetch. e.g. _mm_prefetch((char*)(A+64), _MM_HINT_NTA); and the same for B would prefetch 16*64 = 1024 bytes head of where you're loading, allowing for hiding some of the latency of a cache miss but still easily fitting in L1D.

What is prefetching in computer architecture?

Prefetching is the loading of a resource before it is required to decrease the time waiting for that resource. Examples include instruction prefetching where a CPU caches data and instruction blocks before they are executed, or a web browser requesting copies of commonly accessed web pages.


1 Answers

gcc uses builtin functions as an interface for lowlevel instructions. In particular for your case __builtin_prefetch. But you only should see a measurable difference when using this in cases where the access pattern is not easy to predict automatically.

like image 136
Jens Gustedt Avatar answered Oct 06 '22 11:10

Jens Gustedt