Does the Hyper Threading allow to use of L1-cache to exchange the data between the two threads, which are executed simultaneously on a single physical core, but in two virtual cores? With the proviso that both belong to the same process, i.e. in the same address space. Page 85 (2-55) - Intel® 64 and IA-32 Architectures Optimization Reference Manual: http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-optimization-manual.pdf <blockquote> 2.5.9 Hyper-Threading Technology Support in Intel® Microarchitecture Code Name Nehalem ... Deeper buffering and enhanced resource sharing/partition policies: <ul> <li>Replicated resource for HT operation: register state, renamed return stack buffer, large-page ITLB.</li> <li>Partitioned resources for HT operation: load buffers, store buffers, re-order buffers, small-page ITLB are statically allocated between two logical processors.</li> <li>Competitively-shared resource during HT operation: the reservation station, cache hierarchy, fill buffers, both DTLB0 and STLB.</li> <li>Alternating during HT operation: front end operation generally alternates between two logical processors to ensure fairness.</li> <li>HT unaware resources: execution units.</li> </ul> </blockquote>

The Intel Architecture Software Optimization manual has a brief description of how processor resources are shared between HT threads on a core in chapter 2.3.9. Documented for the Nehalem architecture, getting stale but fairly likely to still be relevant for current ones since the partitioning is logically consistent: <ul> <li>Duplicated for each HT thread: the registers, the return stack buffer, the large-page ITLB</li> <li>Statically allocated for each HT thread: the load, store and re-order buffers, the small-page ITLB</li> <li>Competitively shared between HT threads: the reservation station, the caches, the fill buffers, DTLB0 and STLB.</li> </ul> Your question matches the 3rd bullet. In the very specific case of each HT thread executing code from the same process, a bit of an accident, you can generally expect L1 and L2 to contain data retrieved by one HT thread that can be useful to the other. Keep in mind that the unit of storage in the caches is a cache-line, 64 bytes. Just in case: this is not otherwise a good reason to pursue a thread-scheduling approach that favors getting two HT threads to execute on the same core, assuming your OS would support that. An HT thread generally runs quite a bit slower than a thread that gets the core to itself. 30% is the usual number bandied about, YMMV.

With Hyper Threading, threads of one physical core are exchanging via what level of cache L1/L2/L3?

Tags:

x86

multithreading

x86-64

hyperthreading

smt

Does the Hyper Threading allow to use of L1-cache to exchange the data between the two threads, which are executed simultaneously on a single physical core, but in two virtual cores?

With the proviso that both belong to the same process, i.e. in the same address space.

Page 85 (2-55) - Intel® 64 and IA-32 Architectures Optimization Reference Manual: http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-optimization-manual.pdf

2.5.9 Hyper-Threading Technology Support in Intel® Microarchitecture Code Name Nehalem

...

Deeper buffering and enhanced resource sharing/partition policies:

Replicated resource for HT operation: register state, renamed return stack buffer, large-page ITLB.

Partitioned resources for HT operation: load buffers, store buffers, re-order buffers, small-page ITLB are statically allocated between two logical processors.

Competitively-shared resource during HT operation: the reservation station, cache hierarchy, fill buffers, both DTLB0 and STLB.

Alternating during HT operation: front end operation generally alternates between two logical processors to ensure fairness.

HT unaware resources: execution units.

751

asked Jan 06 '15 11:01

Alex

1 Answers

The Intel Architecture Software Optimization manual has a brief description of how processor resources are shared between HT threads on a core in chapter 2.3.9. Documented for the Nehalem architecture, getting stale but fairly likely to still be relevant for current ones since the partitioning is logically consistent:

Duplicated for each HT thread: the registers, the return stack buffer, the large-page ITLB
Statically allocated for each HT thread: the load, store and re-order buffers, the small-page ITLB
Competitively shared between HT threads: the reservation station, the caches, the fill buffers, DTLB0 and STLB.

Your question matches the 3rd bullet. In the very specific case of each HT thread executing code from the same process, a bit of an accident, you can generally expect L1 and L2 to contain data retrieved by one HT thread that can be useful to the other. Keep in mind that the unit of storage in the caches is a cache-line, 64 bytes. Just in case: this is not otherwise a good reason to pursue a thread-scheduling approach that favors getting two HT threads to execute on the same core, assuming your OS would support that. An HT thread generally runs quite a bit slower than a thread that gets the core to itself. 30% is the usual number bandied about, YMMV.

183

answered Nov 15 '22 10:11

Hans Passant

Related questions
                            
                                JavaFX Task threads not terminating
                            
                                is it thread safe to register for a c# event?
                            
                                IOC DI Multi-Threaded Lifecycle Scoping in Background Tasks
                            
                                Are C++ smart pointers lockfree?
                            
                                Parallel writes of a same value
                            
                                Do I need a critical section to get the index of a value in a stringlist?
                            
                                Thread Safety: Lock vs Reference
                            
                                Why does this implementation of multiprocessing.pool not work?
                            
                                What is the difference between the volatile modifier and Volatile.Read/Write?
                            
                                Why ever use std::mutex instead of boost::shared_mutex?
                            
                                Reusing ThreadPoolExecutor vs Creating and Disposing Ad Hoc?
                            
                                Processes vs threads in Java
                            
                                SemaphoreSlim.WaitAsync continuation code
                            
                                How do I correctly handle a permanently hung third-party library call in a thread in C++?
                            
                                how to convert a nested for loop into multithreading program in perl
                            
                                inconsistent synchronisation violation
                            
                                How to queue tasks in JavaFX?
                            
                                C++11 get a task finished by one of two algorithms
                            
                                Running async function in php
                            
                                How to check if a Java thread terminated with exception?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With