Does clflush
1 also flush associated TLB entries? I would assume not since clflush
operates at a cache-line granularity, while TLB entries exist at the (much larger) page granularity - but I am prepared to be suprised.
1 ... or clflushopt
although one would reasonably assume their behaviors are the same.
I think it's safe to assume no; baking invlpg
into clflush
sounds like an insane design decision that I don't think anyone would make. You often want to invalidate multiple lines in a page. There's also no apparent benefit; flushing the TLB as well doesn't make it any easier to implement data-cache flushing.
Even just dropping the final TLB entry (without necessarily invalidating any page-directory caching) would be weaker than invlpg
but still not make sense.
All modern x86s use caches with physical indexing/tagging, not virtual. (VIPT L1d caches are really PIPT with free translation of the index because it's taken from address bits that are part of the offset within a page.) And even if caches were virtual, invalidating TLB entries requires invaliding virtual caches but not the other way around.
According to IACA, clflush
is only 2 uops on HSW-SKL, and 4 uops (including micro-fusion) on NHM-IVB. So it's not even micro-coded on Intel.
IACA doesn't model invlpg
, but I assume it's more uops. (And it's privileged so it's not totally trivial to test.) It's remotely possible those extra uops on pre-HSW were for TLB invalidation.
I don't have any info on AMD.
The fact that invlpg
is privileged is another reason to expect clflush
not to be a superset of it. clflush
is unprivileged. Presumably it's only for performance reasons that invlpg
is restricted to ring 0 only.
But invlpg
won't page-fault, so user-space could use it to invalidate kernel TLB entries, delaying real-time processes and interrupt handlers. (wbinvd
is privileged for similar reasons: it's very slow and I think not interruptible.) clflush
does fault on illegal addresses so it wouldn't open up that denial-of-service vulnerability. You could clflush
the shared VDSO page, though.
Unless there's some reason why a CPU would want to expose invlpg
in user-space (by baking it in to clflush
), I really don't see why any vendor would do it.
With non-volatile DIMMs in the future of computing, it's even less likely that any future CPUs will make it super-slow to loop over a range of memory doing clflush
. You'd expect most software using memory mapped NV storage to be using clflushopt
, but I'd expect CPU vendors to make clflush
as fast as possible, too.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With