Does clflush also remove TLB entries?

Question

Does clflush¹ also flush associated TLB entries? I would assume not since clflush operates at a cache-line granularity, while TLB entries exist at the (much larger) page granularity - but I am prepared to be suprised.

¹ ... or clflushopt although one would reasonably assume their behaviors are the same.

Peter Cordes · Accepted Answer

I think it's safe to assume no; baking invlpg into clflush sounds like an insane design decision that I don't think anyone would make. You often want to invalidate multiple lines in a page. There's also no apparent benefit; flushing the TLB as well doesn't make it any easier to implement data-cache flushing.

Even just dropping the final TLB entry (without necessarily invalidating any page-directory caching) would be weaker than invlpg but still not make sense.

All modern x86s use caches with physical indexing/tagging, not virtual. (VIPT L1d caches are really PIPT with free translation of the index because it's taken from address bits that are part of the offset within a page.) And even if caches were virtual, invalidating TLB entries requires invaliding virtual caches but not the other way around.

According to IACA, clflush is only 2 uops on HSW-SKL, and 4 uops (including micro-fusion) on NHM-IVB. So it's not even micro-coded on Intel.

IACA doesn't model invlpg, but I assume it's more uops. (And it's privileged so it's not totally trivial to test.) It's remotely possible those extra uops on pre-HSW were for TLB invalidation.

I don't have any info on AMD.

The fact that invlpg is privileged is another reason to expect clflush not to be a superset of it. clflush is unprivileged. Presumably it's only for performance reasons that invlpg is restricted to ring 0 only.

But invlpg won't page-fault, so user-space could use it to invalidate kernel TLB entries, delaying real-time processes and interrupt handlers. (wbinvd is privileged for similar reasons: it's very slow and I think not interruptible.) clflush does fault on illegal addresses so it wouldn't open up that denial-of-service vulnerability. You could clflush the shared VDSO page, though.

Unless there's some reason why a CPU would want to expose invlpg in user-space (by baking it in to clflush), I really don't see why any vendor would do it.

With non-volatile DIMMs in the future of computing, it's even less likely that any future CPUs will make it super-slow to loop over a range of memory doing clflush. You'd expect most software using memory mapped NV storage to be using clflushopt, but I'd expect CPU vendors to make clflush as fast as possible, too.

Does clflush also remove TLB entries?

Tags:

performance

caching

x86

cpu-cache

BeeOnRope

1 Answers

Peter Cordes

Recent Activity

Donate For Us

Does clflush also remove TLB entries?

Tags:

performance

caching

x86

cpu-cache

BeeOnRope

1 Answers

Peter Cordes

Related questions

Recent Activity

Donate For Us