Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can "non-native" pointers hurt cache performance?

As far as I can tell, hardware prefetchers will at the very least detect and fetch constant strides through memory. Additionally it can monitor data access patterns, whatever that really means. Which led me to wonder, do hardware prefetchers ever base their decisions on actual data stored in memory, or purely based on the behaviour a program is exhibiting?

The reason I ask is because I will occasionally use "non-native" pointers as pointers. A simple example of this would be a preallocated array of stuff, and small integers indexing this array instead of pointers. If I need to store a whole lot of such "pointers", the savings in memory can add up quickly and in turn indirectly improve cache-performance by using less memory.

But for all I know, this might interfere with how hardware prefetchers work. Or not!

I can certainly imagine, realistic or not, a prefetching unit that examines cache lines that enters L1 cache for native pointer addresses and starts fetching them into L2 or some such thing. In that case, my clever trick of saving memory suddenly seems less decidedly clever.

So, what do modern hardware prefetchers do, really? Can they be tripped up by "non-native" pointers?

like image 909
porgarmingduod Avatar asked Nov 13 '13 13:11

porgarmingduod


2 Answers

The hardware prefetcher doesn't see pointers, it sees memory addresses. It doesn't care where the address came from, or what type it had in the C++ program you wrote. It just looks at which address the CPU is being told to read to or write from.

So no, indexing into an array is not going to be a scary new thing that the CPU has never encountered before.

like image 73
jalf Avatar answered Oct 20 '22 14:10

jalf


Linked data structures (LDS) prefetching is still a known problem in computer architecture. I'm not familiar with any modern CPU that actually does that, but in theory it's possible. There have been several academy papers over the years that propose some variations over:

  1. A dedicated HW that can detect address-like values within fetched cache lines, and issue prefetches to these addresses.
  2. A compiler-assisted technique where the compiler recognized the data structure dependencies and inserts SW prefetches or other hints.

Both these methods may be affected by your technique (the first would be rendered useless, the second may work if the compiler is sufficiently clever).

Of course you'd have to actually run on such a machine so it's only theoretical, and you shouldn't have to change your practice if it works fine for you, but it goes to show that profiling should be specific per micro-architecture and system, and what helps you in one case, may be less efficient on another.
Generally speaking - don't just trust the CPU to do or not do some optimization (unless it's documented), always check you get the expected behavior.

By the way, note that even if the HW sees the content of the memory, it's still in the virtual address space - the HW would anyway have to do some sort of translation to physical address to use it, so in a sense there doesn't have to be any additional overhead.

Some bibliography:

  • Compiler-Directed Content-Aware Prefetching for Dynamic Data Structures
  • Dependence Based Prefetching for Linked Data Structures
  • Guided Region Prefetching: A Cooperative Hardware/Software Approach
like image 25
Leeor Avatar answered Oct 20 '22 12:10

Leeor