Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does Linux use x86 CPU's PCID feature for TLB? If not, why?

I wrote a kernel module to check CR4.PCIDE, it is not set. Why doesn't Linux use such feature to reduce the performance slowdown due to TLB invalidation and cache pollution?

like image 727
W.Sun Avatar asked Nov 22 '13 22:11

W.Sun


2 Answers

Update: This changed around the 4.15 timeframe due to the Meltdown and Spectre attacks in late 2017 and early 2018. See the other answer for details.

Note: I'm not a Linux developer

For Intel's "Process Context Identifiers", there's a limit of 4096 IDs. This means that when there are more than 4096 processes you need to manage them (e.g. maybe do a "least recently used" thing so that if a process that currently doesn't have an ID needs to be executed then the ID is taken from some other process and reused).

The other thing that comes into it is "TLB shootdown" on multi-CPU systems. These can be a little expensive, so people do tricks to avoid them. For example, if a process only has one thread then it can only be running on one CPU and you know there's no need to send an IPI to other CPUs (interrupting them and asking them to do the "TLB shootdown"). Once you start using PCIDs you can't be sure that other CPUs don't still have TLB entries, and can't do these tricks to avoid "TLB shootdown". It also means that (in theory, for badly implemented PCID support) the performance you gain from PCID may be less than the performance you lose due to unavoided TLB shootdown and ID management overhead, resulting in a net loss.

Mostly what I'm saying is that it's a little complicated to add support for PCID (it's not like you can just set a flag in CR4 and forget about it). You'd have to do some research (experiments, prototypes, benchmarking) to determine the most effective way of implementing it. For a large/complex/old kernel (like Linux) it'd be even more complicated as you'd have to be careful not to upset something else by accident. The other thing is that this feature is relatively new (it's only existed for a few years if I remember correctly) and isn't supported by a lot of CPUs (e.g. anything a little older, and anything from AMD).

Basically, I'd assume that it comes down to "time vs. benefits" (or, not enough time for a small performance improvement on a limited number of CPUs).

like image 173
Brendan Avatar answered Sep 20 '22 13:09

Brendan


Yes! Recent versions of the Linux Kernel have PCID support. At the time this question was asked, this support didn't exist, but it has been added near the end of 2017, starting with the 4.14 kernel. You can follow some of the original patch discussion in this LKML chain.

The change doesn't actually associate a unique PCID per-process, since there are a limited number, or try to assign them to frequently used basis, but uses a PCID cache per CPU, so that several running processes on a given CPU are likely to be able to use the PCID mechanism to avoid TLB flush overhead.

This became more relevant recently, since a series of vulnerabilities where found which allows unprivileged user code to read kernel memory, against which the KPTI patches were deployed. These patches can have a significant performance impact, since the user-level TLB entries may be invalidated on any kernel call. With PCID support, the impact is reduced because the user-level TLB entries are preserved.


An older version of this answer is found below, at a time when PCID support wasn't available in the released kernels:

Not yet, but it seems like something might be in the works. See the thread starting around here on the LKML. In particular, there are proposed solutions to the cross-core TLB shootdown issues, among others:

If, when receiving a TLB shootdown for a non-current PCID, we just flush all the entries for that PCID and remove the CPU from the mm's cpu_vm_mask_var, we will never receive more than one shootdown IPI for a non-current mm, but we will still get the benefits of TLB longevity when dealing with eg. pipe workloads where tasks take turns running on the same CPU.

You can also glean from that thread that address-space identifiers have long been used on other Linux architectures.

like image 30
BeeOnRope Avatar answered Sep 17 '22 13:09

BeeOnRope