My understanding is that hardware prefetching will never cross page boundaries. I'm wondering if a software prefetch has the same restriction i.e. can I use a software prefetch to avoid a future TLB miss. From searching around, it appears to be possible, but I couldn't find anything definitive in the documentation, so a reference would be good.
I'm specifically interested in Nehalem, Sandy Bridge and Westmere.
Cache prefetching is a technique used by computer processors to boost execution performance by fetching instructions or data from their original storage in slower memory to a faster local memory before it is actually needed (hence the term 'prefetch').
With software prefetching the programmer or compiler inserts prefetch instructions into the program. These are instructions that initiate a load of a cache line into the cache, but do not stall waiting for the data to arrive.
Prefetching in the Linux kernel is beneficial for sequential accesses to a file, that is, accesses to consecutive blocks of that file. When a file is not accessed sequentially, prefetch- ing can potentially result in extra I/Os by reading data that is not used.
According to Intel's Optimization Reference Manual, it depends on the processor. From section 7.4.3:
There are cases where a PREFETCH will not perform the data prefetch. These include:
- PREFETCH causes a DTLB (Data Translation Lookaside Buffer) miss. This applies to Pentium 4 processors with CPUID signature corresponding to family 15, model 0, 1, or 2. PREFETCH resolves DTLB misses and fetches data on Pentium 4 processors with CPUID signature corresponding to family 15, model 3.
- An access to the specified address that causes a fault/exception.
Software prefetching may or may not avoid TLB misses, depending on the processor. It will not fetch the data if it would cause a page fault.
If you want ensure you avoid TLB misses, you could do a dummy read to load the data instead of a prefetch instruction. This could cause a page fault to swap in a page, which could be either good or bad depending on your use case.
In modern processors (Nehalem, Sandy Bridge and Westmere) software prefetching does indeed trigger a TLB lookup.
From the Intel optimization guide: (section 7.3.3)
In older microarchitectures, PREFETCH causing a Data Translation Lookaside Buffer (DTLB) miss would be dropped. In processors based on Nehalem, Westmere, Sandy Bridge, and newer microar-chitectures, Intel Core 2 processors, and Intel Atom processors, PREFETCH causing a DTLB miss can be fetched across a page boundary.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With