Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I prefetch a memory region most easily?

Background: I've implemented a stochastic algorithm that requires random ordering for best convergence. Doing so obviously destroys memory locality, however. I've found that by prefetching the next iteration's data, the performance drop is minimized.

I can prefetch n cache lines using _mm_prefetch in a simple, mostly OS+compiler-portable fashion - but what's the length of a cache line? Right now, I'm using a hardcoded value of 64, which seems to be the norm nowadays on x64 processors - but I don't know how to detect this at runtime, and a question about this last year found no simple solution.

I've seen GetLogicalProcessorInformation on windows but I'm leery of using such a complex API for something so simple, and that won't work on macs or linux anyhow.

Perhaps there's some entirely other API/intrinsic that could prefetch a memory region identified in terms of bytes (or words, or whatever) and allows me to prefetch without knowing the cache line length?

Basically, is there a reasonable alternative to _mm_prefetch with #define CACHE_LINE_LEN 64?

like image 332
Eamon Nerbonne Avatar asked Oct 20 '10 15:10

Eamon Nerbonne


People also ask

Does prefetch increase performance?

Only in over-provisioned systems, can prefetching with low predictive accuracy improve performance. However, the data cache is obviously under-provisioned as it can keep only a subset of the data-set. The prefetched data typically shares the cache space with demand-paged data.

What is prefetch memory?

Cache prefetching is a technique used by computer processors to boost execution performance by fetching instructions or data from their original storage in slower memory to a faster local memory before it is actually needed (hence the term 'prefetch').

What is prefetch buffer in computer architecture?

In computer architecture, prefetching refers to the retrieving and storing of data into the buffer memory (cache) before the processor requires the data. When the processor wants to process the data, it is readily available and can be processed within a very short period of time.

What is hardware prefetching?

A hardware prefetcher is a data prefetching technique that is implemented as a hardware component in a processor. Any other prefetching technique is a nonhardware prefetcher. Fig. 1 shows a classification of data prefetching techniques.


1 Answers

There's a question asking just about the same thing here. You can read it from the CPUID if you feel like delving into some assembly. You'll have to write platform specific code for this of course.

You're probably already familiar with Agner Fog's manuals for optimization which gives the cache information for many popular processors. If you are able to determine the expected CPU's you'll encounter you can just hard-code the cache line sizes and look up the CPU vendor information to set the line size.

like image 103
Ron Warholic Avatar answered Sep 21 '22 11:09

Ron Warholic