In recent Intel ISA documents the lfence
instruction has been defined as serializing the instruction stream (preventing out-of-order execution across it). In particular, the description of the instruction includes this line:
Specifically, LFENCE does not execute until all prior instructions have completed locally, and no later instruction begins execution until LFENCE completes.
Note that this applies to all instructions, not just memory load instructions, making lfence
more than just a memory ordering fence.
Although this now appears in the ISA documentation, it isn't clear if it is "architectural", i.e., to be obeyed by all x86 implementations, or if it is Intel specific. In particular, do AMD processors also treat lfence
as serializing the instruction stream?
There is an MSR that configures that behaviour:
Description: Set an MSR in the processor so that LFENCE is a dispatch serializing instruction and then use LFENCE in code streams to serialize dispatch (LFENCE is faster than RDTSCP which is also dispatch serializing). This mode of LFENCE may be enabled by setting MSR C001_1029[1]=1.
Effect: Upon encountering an LFENCE when the MSR bit is set, dispatch will stop until the LFENCE instruction becomes the oldest instruction in the machine.
Applicability: All AMD family 10h/12h/14h/15h/16h/17h processors support this MSR. LFENCE support is indicated by CPUID function1 EDX bit 26, SSE2. AMD family 0Fh/11h processors support LFENCE as serializing always but do not support this MSR. AMD plans support for this MSR and access to this bit for all future processors.
(source)
AMD has always in their manual described their implementation of LFENCE
as a load serializing instruction
Acts as a barrier to force strong memory ordering (serialization) between load instructions preceding the LFENCE and load instructions that follow the LFENCE.
The original use case for LFENCE
was ordering WC memory type loads. However, after the speculative execution vulnerabilities were discovered, AMD released a document in January 2018 entitled "Software techniques for managing speculation on AMD processors". This is the first and only document in which MSR C001_1029[1] is mentioned (other bits of C001_1029 are discussed in some AMD documents, but not bit 1). When C001_1029[1] is set to 1, LFENCE
behaves as a dispatch serializing instruction (which is more expensive than merely load serializing). Since this MSR is available on most older AMD processors, it seems that it has almost always been supported. Maybe because they thought they might need in the future to maintain compatibility with Intel processors regarding the behavior of LFENCE
.
There are exceptions to the ordering rules of fence instructions and serializing instructions and instructions that have serializing properties. These exceptions are subtly different between Intel and AMD processors. An example that I can think of right now is the CLFLUSH
instruction. So AMD and Intel mean slightly different things when they talk about instructions with serializing properties.
One thing not clear to me is the following part of the quote from harlod's answer:
AMD family 0Fh/11h processors support LFENCE as serializing always but do not support this MSR.
This statement is vague because it doesn't clearly say whether LFENCE
on AMD families 0Fh and 11h is fully serializing (in AMD terminology) or dispatch serializing (in AMD terminology). But it's most probably dispatch serializing only. The AMD family-specific manuals don't mention LFENCE
or MSR C001_1029.
Since the Linux kernel v4.15-rc8, the serializing properties of LFENCE
on AMD processors are used. The change consists of two commits 1 and 2. The following macros were defined:
+#define MSR_F10H_DECFG 0xc0011029
+#define MSR_F10H_DECFG_LFENCE_SERIALIZE_BIT 1
The first macro specifies the MSR address and the second specifies the offset. The following code was added in init_amd
(some comments are mine):
/* LFENCE always requires SSE2 */
if (cpu_has(c, X86_FEATURE_XMM2)) {
unsigned long long val;
int ret;
/* The AMD CPU supports LFENCE, but there are three cases to be considered:
* 1- MSR C001_1029[1] must be set to enable the dispatch
* serializing behavior of LFENCE. This can only be done
* if and only if the MSR is supported.
* 2- The MSR is not supported (AMD 0Fh/11h). LFENCE is by
* default at least dispatch serializing. Nothing needs to
* be done.
* 3- The MSR is supported, but we are running under a hypervisor
* that does not support writing that MSR (because perhaps
* the hypervisor has not been updated yet). In this case, resort
* to the slower MFENCE for serializing RDTSC and use a Spectre
* mitigation that does not require LFENCE (i.e., generic retpoline).
/*
* A serializing LFENCE has less overhead than MFENCE, so
* use it for execution serialization. On families which
* don't have that MSR, LFENCE is already serializing.
* msr_set_bit() uses the safe accessors, too, even if the MSR
* is not present.
*/
msr_set_bit(MSR_F10H_DECFG,
MSR_F10H_DECFG_LFENCE_SERIALIZE_BIT);
/*
* Verify that the MSR write was successful (could be running
* under a hypervisor) and only then assume that LFENCE is
* serializing.
*/
ret = rdmsrl_safe(MSR_F10H_DECFG, &val);
if (!ret && (val & MSR_F10H_DECFG_LFENCE_SERIALIZE)) {
/* A serializing LFENCE stops RDTSC speculation */
set_cpu_cap(c, X86_FEATURE_LFENCE_RDTSC);
/* X86_FEATURE_LFENCE_RDTSC is used later to choose a Spectre
mitigation */
} else {
/* MFENCE stops RDTSC speculation */
set_cpu_cap(c, X86_FEATURE_MFENCE_RDTSC);
}
}
Since v5.4-rc1, the MSR write verification code was removed. So the code became:
msr_set_bit(MSR_F10H_DECFG,
MSR_F10H_DECFG_LFENCE_SERIALIZE_BIT);
set_cpu_cap(c, X86_FEATURE_LFENCE_RDTSC);
The reasoning behind this change is discussed in the commit message. (In summary, it's mostly not needed, and it may not work.)
That document also says:
All AMD family 10h/12h/14h/15h/16h/17h processors support this MSR. LFENCE support is indicated by CPUID function1 EDX bit 26, SSE2. AMD family 0Fh/11h processors support LFENCE as serializing always but do not support this MSR.
But it appears that none of the AMD manuals have been updated yet to mention support for C001_1029[1].
AMD said the following in that document:
AMD plans support for this MSR and access to this bit for all future processors.
This means that C001_1029[1] should be considered as architectural on future AMD processors (with respect to January 2018).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With