Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does a hyper-threaded core share MMU and TLB?

To my knowledge, both MMU and TLB are not shared in a hyper-threaded core in Intel x86_64.

However, then, if two threads that don't share the address space are scheduled to the same physical core, how do they run?

I think, in that case, the threads don't have any chances to hit TLB, because the threads have their own address spaces.

If then, the performance will be so downgraded in my opinion.

like image 248
Jonggyu Park Avatar asked Jul 16 '18 11:07

Jonggyu Park


People also ask

Does each core have its own MMU?

To be able to work with Flash and SRAM memory it is necessary to initialize MMU (memory management unit). Each core has its own MMU unit.

Does Hyper-Threading double the cores?

Hyper-threading Technology (HTT), created by Intel almost 15 years ago, was designed to increase the performance of CPU cores. Intel explains that HTT uses processor resources more efficiently, and enables multiple threads to run on each core.

What is a hyper threaded core?

Hyper-threading is a process by which a CPU divides up its physical cores into virtual cores that are treated as if they are actually physical cores by the operating system. These virtual cores are also called threads [1]. Most of Intel's CPUs with 2 cores use this process to create 4 threads or 4 virtual cores.

Is Hyper-Threading concurrent?

With HTT, one physical core appears as two processors to the operating system, allowing concurrent scheduling of two processes per core. In addition, two or more processes can use the same resources: If resources for one process are not available, then another process can continue if its resources are available.


1 Answers

The TLBs are organized in Intel processors as follows:

  • Intel NetBurst (the first to support HT): The ITLB is replicated. The DTLB is competitively shared.
  • Intel Nehalem (the second to support HT), Westmere, Sandy Bridge, and Ivy Bridge: The huge page ITLB is replicated. The small page ITLB is statically partitioned. All DTLBs are competitively shared.
  • Intel Haswell, Broadwell, and Skylake: The small page ITLB is dynamically partitioned. The huge page ITLB is replicated. Table 2-12 of the optimization manual (September 2019) says that the policy is "fixed" for the other TLBs. I thought this means static partitioning. But according to the experimental results of the paper titled Translation Leak-aside Buffer: Defeating Cache Side-channel Protections with TLB Attacks (Section 6), it appears that "fixed" means competitive sharing. That would be consistent with earlier and later microarchitectures.
  • Sunny Cove: The ITLBs are statically partitioned. All DTLBs and the STLB are competitively shared.
  • AMD Zen, Zen+, Zen 2 (Family 17h): All TLBs are competitively shared.

References:

  • For NetBurst: https://software.intel.com/en-us/articles/introduction-to-hyper-threading-technology.
  • For the other Intel microarchitectures: The information can be found in the Optimization Reference Manual.
  • For the AMD microarchitectures: The information can be found in the Software Optimization Guide.

It's not clear to me how the TLBs are organized in any of the Intel Atom microarchitectures. I think that the L1 DTLB and STLB (in Goldmont Plus) or L2 DTLB (in earlier microarchitectures) are shared. According to Section 8.7.13.2 of the Intel SDM V3 (October 2019):

In processors supporting Intel Hyper-Threading Technology, data cache TLBs are shared. The instruction cache TLB may be duplicated or shared in each logical processor, depending on implementation specifics of different processor families.

Although this is not accurate since an ITLB can be partitioned as well.

I don't know about the ITLBs in Intel Atoms.

(By the way, in older AMD processors, all the TLBs are replicated per core. See: Physical core and Logical cores on different cpu AMD/Intel.)

When a TLB is shared, each TLB entry is tagged with the logical processor ID (a single bit value, which is different from the process-context identifier, which can be disabled or may not be supported) that allocated it. If another thread gets scheduled to run on a logical core and the thread accesses a different virtual address space than the previous thread, the OS has to load the corresponding base physical address of the first-level page structure into CR3. Whenever CR3 is written to, the core automatically flushes all entries in all shared TLBs that are tagged with the ID of the logical core. There are other operations that may trigger this flushing.

Partitioned and replicated TLBs don't need to be tagged with logical core IDs.

If process-context identifiers (PCIDs) are supported and enabled, logical core IDs are not used because PCIDs are more powerful. Note that partitioned and replicated TLBs are tagged with PCIDs.

Related: Address translation with multiple pagesize-specific TLBs.

(Note that there are other paging structure caches and they are organized similarly.)

(Note that usually the TLB is considered to be part of the MMU. The Wikipedia article on MMU shows a figure from an old version of a book that indicates that they are separate. However, the most recent version of the book has removed the figure and says that the TLB is part of the MMU.)

like image 58
Hadi Brais Avatar answered Oct 11 '22 00:10

Hadi Brais