Although I have read about movntdqa instructions regarding this but have figured out a clean way to express a memory range uncacheable or read data so as to not pollute the cache. I want to do this from gcc. My main goal is to swap to random locations in an large array. Hoping to accelerate this operation by avoiding caching since there is very little data resue.
I think what you're describing is Memory Type Range Registers. You can control these under Linux (if available and you're user 0) using /proc/mttr
/ ioctl(2)
see here for an example. As it works on a physical address range I think you're going to have a hard time using it in a reasonable way.
A better way is to look at the compiler intrinsics GCC provides and find one or more, that expresses your intent. Have a look at Ulrich Drepper's series on "What every programmer should know about memory", in particular part 5 which deals with bypassing the cache. It looks like _mm_prefetch(ptr, _MM_HINT_NTA)
might be appropriate for your needs.
As always when it comes to performance - measure, measure, measure. Drepper's series has excellent parts detailing how this can be done (part 7) as well as code examples and other strategies to try when speeding up the memory performance of your code.
All good advice from user786653; the Ulrich Drepper article especially. I'll add:
Uncached or not, the VM HW is going to have to look up page info in the TLB, which has a limited capacity. Don't underestimate the impact of TLB thrashing on random access performance. If you're not already, see the results here for why you really want to be using huge pages for your array data and not the teeny 4K default (which goes back to the days of "640K ought to be enough for anybody"). Of course if you're talking really huge arrays bigger than even a TLB full of 2MB pages can reference, even that won't help with this.
What have you got against the 'nt' instructions (e.g _mm_stream_ps
intrinsic) ? I'm unconvinced declaring pages uncached will get you any better performance than appropriate use of those, and they're much easier to use than the alternatives. Would be very interested to see evidence to the contrary though.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With