Imagine a CPU (or core) that is superscalar (multiple execution units) and also has hyperthreading (SMT) support.
Why is the number of software threads the CPU can truly execute in parallel typically given by the number of logical cores (i.e. so-called hardware threads) it possesses, and not the total number of execution units it has?
If my understanding is correct, SMT doesn't actually enable true parallel execution, it instead simply makes context switching much faster/more efficient by duplicating certain parts of the CPU (those that store the architectural state, but not the main execution resources). On the other hand, superscalar architecture allows true simultaneous execution of multiple instructions per clock cycle, because the CPU has multiple execution units, i.e. multiple parallel pipelines which can each can process a separate thread, in true parallel fashion.
So for example, if a CPU has 2 cores, and each core has 2 execution units, shouldn't its hardware concurrency (the number of threads it can truly execute in parallel) be 4? Why is its hardware concurrency instead given by the number of logical cores, when SMT doesn't actually enable true parallel execution?
With the help of this, the CPU can perform more task in the same amount of time. Hyper-Threading increases the performance of CPU cores, it enables multiple threads which are sequences of the instruction to be run by each core to make the CPU run more efficiently. With the help of this, the CPU can perform more task in the same amount of time.
The important part is that each core (executing with its own instruction counter) can also be super-scalar in order to execute each single process more quickly! It is possible to have super-scalar without pipelining or out-of-order execution by having what's called very long instruction word or "VLIW".
For each processor core that is physically present, the operating system addresses two virtual (logical) cores and shares the workload between them when possible. The main function of hyper-threading is to increase the number of independent instructions in the pipeline; it takes advantage of superscalar architecture,...
If the operating system's thread scheduler is unaware of hyper-threading, it will treat all four logical processors the same.
You can't just slam instructions into the execution units.
If you want two a 2-way SMT you need to keep two architectural states and fetch two instruction streams.
If a company has 100 developers but only two project managers it can only develop two projects in parallel (but it can concurrently develop more if it make the PMs switch project each day or so).
If a CPU can fetch only from two instruction streams (keeping only two thread contexts) you can assign it only two threads to execute in parallel.
You can however make a time-division and execute more threads concurrently.
The software has no access to the execution units, that would make a circular argument (the software needs the EUs to execute but the EUs need the software to execute).
The CPU will try to use as much as the EUs as possible exploiting Out-of-order and speculating on anything it can.
Actually, hyper-threading is just a way to keep all the resources busy (like sharing a developer with another PM when they have little to do).
But if all fails and an EU is not used, then that possible unit of work has simply gone wasted.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With