Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What does the processor do while waiting for a main memory fetch

Assuming l1 and l2 cache requests result in a miss, does the processor stall until main memory has been accessed?

I heard about the idea of switching to another thread, if so what is used to wake up the stalled thread?

like image 456
user1223028 Avatar asked Aug 29 '14 07:08

user1223028


1 Answers

There are many, many things going on in a modern CPU at the same time. Of course anything needing the result of the memory access cannot proceed, but there may be plenty more things to do. Assume the following C code:

double sum = 0.0; 
for (int i = 0; i < 4; ++i) sum += a [i];

if (sum > 10.0) call_some_function ();

and assume that reading the array a stalls. Since reading a [0] stalls, the addition sum += a [0] will stall. However, the processor goes on performing other instructions. Like increasing i, checking that i < 4, looping, and reading a [1]. This stalls as well, the second addition sum += a [1] stalls - this time because neither the correct value of sum nor the value a [1] are known, but things go on and eventually the code reaches the statement "if (sum > 10.0)".

The processor at this point has no idea what sum is. However it can guess the outcome, based on what happened in previous branches, and start executing the function call_some_function () speculatively. So it continues running, but carefully: When call_some_function () stores things to memory, it doesn't happen yet.

Eventually reading a [0] succeeds, many cycles later. When that happens, it will be added to sum, then a [1] will be added to sum, then a [2], then a [3], then the comparison sum > 10.0 will performed properly. Then the decision to branch will turn out to be correct or incorrect. If incorrect, all the results of call_some_function () are throw away. If correct, all the results of call_some_function () are turned from speculative results into real results.

If the stall takes too long, the processor will eventually run out of things to do. It can easily handle the four additions and one compare that couldn't be executed, but eventually it's too much and the processor must stop. However, on a hyper threaded system, you have another thread that can continue running happily, and at a higher speed because nobody else uses the core, so the whole core still can go on doing useful work.

like image 65
gnasher729 Avatar answered Oct 06 '22 00:10

gnasher729