Is cache prefetching done in hardware address space or virtual address space?

Tags:

Does the hardware prefetcher operate on contiguous virtual addresses, or is it operating on contiguous hardware addresses? Imagine the case where you have a large array of bytes which span multiple pages. In the virtual address space the bytes are contiguous, but in fact the pages could be allocated in disjoint pages in hardware. I would hope that the prefetcher is able to do the appropriate conversion using the TLB before it starts to bring in cache lines that belong to the next page.

Is this so? I couldn't find information that confirmed this and was hoping someone could give more insight.

I'm asking for x86 mainly, but any insight would be appreciated

731

asked Mar 23 '17 17:03

thisisdog

1 Answers

I can't answer this for AMD processors, but I can answer it for Intel ones.

As far as I know, the hardware prefetcher(s) should not prefetch cache lines across page boundaries on current Intel processors.

From Intel's Intel® 64 and IA-32 Architectures Optimization Reference Manual, section 7.5.2, Hardware Prefetch:

Automatic hardware prefetch can bring cache lines into the unified last-level cache based on prior data misses. It will attempt to prefetch two cache lines ahead of the prefetch stream. Characteristics of the hardware prefetcher are:

[...]

It will not prefetch across a 4-KByte page boundary. A program has to initiate demand loads for the new page before the hardware prefetcher starts prefetching from the new page.

Above paragraph is talking about "unified last-level cache", but things aren't better in L1d land:

2.3.5.4, Data Prefetching

Data Prefetch to L1 Data Cache

Data prefetching is triggered by load operations when the following conditions are met:

[...]

The prefetched data is within the same 4K byte page as the load instruction that triggered it.

Or in L2:

The following two hardware prefetchers fetched data from memory to the L2 cache and last level cache:

Spatial Prefetcher: [...]

Streamer: This prefetcher monitors read requests from the L1 cache for ascending and descending sequences of addresses. Monitored read requests include L1 DCache requests initiated by load and store operations and by the hardware prefetchers, and L1 ICache requests for code fetch. When a forward or backward stream of requests is detected, the anticipated cache lines are prefetched. Prefetched cache lines must be in the same 4K page.

However, the processor might prefetch paging data. From Intel's Intel® 64 and IA-32 Architectures Software Developer Manuals, Volume 3A, 4.10.2.3, Details of TLB Use:

The processor may cache translations required for prefetches and for accesses that are a result of speculative execution that would never actually occur in the executed code path.

Volume 3A, 4.10.3.1, Caches for Paging Structures:

The processor may create entries in paging-structure caches for translations required for prefetches and for accesses that are a result of speculative execution that would never actually occur in the executed code path.

I know you asked about hardware prefetching, but you should be able to use software prefetching for data (not instructions):

In older microarchitectures, PREFETCH causing a Data Translation Lookaside Buffer (DTLB) miss would be dropped. In processors based on Nehalem, Westmere, Sandy Bridge, and newer microarchitectures, Intel Core 2 processors, and Intel Atom processors, PREFETCH causing a DTLB miss can be fetched across a page boundary.

answered Sep 21 '22 10:09

11181

Related questions
                            
                                Get react router path in props with redux ownProps
                            
                                Celery RabbitMQ broker failover connect issue
                            
                                Facebook Login using Exponent React Native
                            
                                Can Jenkins job execute shell or Windows command conditionally based on agent OS?
                            
                                CSS Rule exclude parent class
                            
                                Xamarin.Forms adding Effect to all Entries in XAML
                            
                                How to aggregate a boolean field with null values with pandas?
                            
                                Firebase catch exception
                            
                                ETL/ Data Warehousing Approach using APIs
                            
                                Converting proto buffer to ProtoRPC
                            
                                How does one find out whether a Windows Media Foundation sink writer needs bottom-up or top-down images?
                            
                                Client-Side templating with nodejs and pug

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With