To my knowledge, both MMU and TLB are not shared in a hyper-threaded core in Intel x86_64.
However, then, if two threads that don't share the address space are scheduled to the same physical core, how do they run?
I think, in that case, the threads don't have any chances to hit TLB, because the threads have their own address spaces.
If then, the performance will be so downgraded in my opinion.
To be able to work with Flash and SRAM memory it is necessary to initialize MMU (memory management unit). Each core has its own MMU unit.
Hyper-threading Technology (HTT), created by Intel almost 15 years ago, was designed to increase the performance of CPU cores. Intel explains that HTT uses processor resources more efficiently, and enables multiple threads to run on each core.
Hyper-threading is a process by which a CPU divides up its physical cores into virtual cores that are treated as if they are actually physical cores by the operating system. These virtual cores are also called threads [1]. Most of Intel's CPUs with 2 cores use this process to create 4 threads or 4 virtual cores.
With HTT, one physical core appears as two processors to the operating system, allowing concurrent scheduling of two processes per core. In addition, two or more processes can use the same resources: If resources for one process are not available, then another process can continue if its resources are available.
The TLBs are organized in Intel processors as follows:
References:
It's not clear to me how the TLBs are organized in any of the Intel Atom microarchitectures. I think that the L1 DTLB and STLB (in Goldmont Plus) or L2 DTLB (in earlier microarchitectures) are shared. According to Section 8.7.13.2 of the Intel SDM V3 (October 2019):
In processors supporting Intel Hyper-Threading Technology, data cache TLBs are shared. The instruction cache TLB may be duplicated or shared in each logical processor, depending on implementation specifics of different processor families.
Although this is not accurate since an ITLB can be partitioned as well.
I don't know about the ITLBs in Intel Atoms.
(By the way, in older AMD processors, all the TLBs are replicated per core. See: Physical core and Logical cores on different cpu AMD/Intel.)
When a TLB is shared, each TLB entry is tagged with the logical processor ID (a single bit value, which is different from the process-context identifier, which can be disabled or may not be supported) that allocated it. If another thread gets scheduled to run on a logical core and the thread accesses a different virtual address space than the previous thread, the OS has to load the corresponding base physical address of the first-level page structure into CR3. Whenever CR3 is written to, the core automatically flushes all entries in all shared TLBs that are tagged with the ID of the logical core. There are other operations that may trigger this flushing.
Partitioned and replicated TLBs don't need to be tagged with logical core IDs.
If process-context identifiers (PCIDs) are supported and enabled, logical core IDs are not used because PCIDs are more powerful. Note that partitioned and replicated TLBs are tagged with PCIDs.
Related: Address translation with multiple pagesize-specific TLBs.
(Note that there are other paging structure caches and they are organized similarly.)
(Note that usually the TLB is considered to be part of the MMU. The Wikipedia article on MMU shows a figure from an old version of a book that indicates that they are separate. However, the most recent version of the book has removed the figure and says that the TLB is part of the MMU.)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With