Does the Hyper Threading allow to use of L1-cache to exchange the data between the two threads, which are executed simultaneously on a single physical core, but in two virtual cores?
With the proviso that both belong to the same process, i.e. in the same address space.
Page 85 (2-55) - Intel® 64 and IA-32 Architectures Optimization Reference Manual: http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-optimization-manual.pdf
2.5.9 Hyper-Threading Technology Support in Intel® Microarchitecture Code Name Nehalem
...
Deeper buffering and enhanced resource sharing/partition policies:
Replicated resource for HT operation: register state, renamed return stack buffer, large-page ITLB.
Partitioned resources for HT operation: load buffers, store buffers, re-order buffers, small-page ITLB are statically allocated between two logical processors.
Competitively-shared resource during HT operation: the reservation station, cache hierarchy, fill buffers, both DTLB0 and STLB.
Alternating during HT operation: front end operation generally alternates between two logical processors to ensure fairness.
HT unaware resources: execution units.
Cache is graded as Level 1 (L1), Level 2 (L2) and Level 3 (L3): L1 is usually part of the CPU chip itself and is both the smallest and the fastest to access. Its size is often restricted to between 8 KB and 64 KB. L2 and L3 caches are bigger than L1. They are extra caches built between the CPU and the RAM.
L1 is "level-1" cache memory, usually built onto the microprocessor chip itself. For example, the Intel MMX microprocessor comes with 32 thousand bytes of L1. L2 (that is, level-2) cache memory is on a separate chip (possibly on an expansion card) that can be accessed more quickly than the larger "main" memory.
L1 is located on CPU chip, L2 is located between processor and main memory, but there is a point to know that in some system L2 is located on CPU chip while in some other system L2 is located on mother board itself, and L3 is constantly located on main board chip.
Every core of a multi-core processor has a dedicated L1 cache and is usually not shared between the cores. The L2 cache, and higher-level caches, may be shared between the cores.
The Intel Architecture Software Optimization manual has a brief description of how processor resources are shared between HT threads on a core in chapter 2.3.9. Documented for the Nehalem architecture, getting stale but fairly likely to still be relevant for current ones since the partitioning is logically consistent:
Duplicated for each HT thread: the registers, the return stack buffer, the large-page ITLB
Statically allocated for each HT thread: the load, store and re-order buffers, the small-page ITLB
Competitively shared between HT threads: the reservation station, the caches, the fill buffers, DTLB0 and STLB.
Your question matches the 3rd bullet. In the very specific case of each HT thread executing code from the same process, a bit of an accident, you can generally expect L1 and L2 to contain data retrieved by one HT thread that can be useful to the other. Keep in mind that the unit of storage in the caches is a cache-line, 64 bytes. Just in case: this is not otherwise a good reason to pursue a thread-scheduling approach that favors getting two HT threads to execute on the same core, assuming your OS would support that. An HT thread generally runs quite a bit slower than a thread that gets the core to itself. 30% is the usual number bandied about, YMMV.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With