I wrote a Win32 application (in Delphi-7 which is 32-bit using TThread class) to create 100 threads. Each thread when resumed will continuously (in a loop) increment a 64 bit counter associated with the thread object (so no locking or sharing of data).
If you let the system run for 10 to 15 seconds and stop after that, you should see roughly the same counts in each of the threads. But what I observed was that 81 threads ran under 400 million loops and the remaining ones looped more than 950 million times. Slowest thread got only 230 million compared to the fastest 2111 million.
According to MSDN, the preemptive multitasking is at the thread-level (not process level), so each of my thread should have gotten its time-slice in a round-robin fashion. What am I missing here and why is this discrepancy?
Edit1: Machine configuration: Intel i7 Quad Core 3.4GHz with hyper-threading turned on (8 active threads at a time). Running Windows-7 64 bit professional (and the test application is 32 bit)
Edit2 (thread code): The test application is built with optimization turned on and without any debug info. Run the test application outside of IDE.
type
TMyThread = class(TThread)
protected
FCount: Int64;
public
constructor Create;
procedure Execute; override;
property Count: Int64 read FCount;
end;
{ TMyThread }
constructor TMyThread.Create;
begin
inherited Create(True);
FCount := 0;
end;
procedure TMyThread.Execute;
begin
inherited;
while not Terminated do
begin
Inc(FCount);
end;
end;
Preemptive multitasking is used in desktop operating systems. Unix was the first operating system to use this method of multitasking. Windows NT and Windows 95 were the first versions of Windows that use preemptive multitasking. With OS X, the Macintosh acquired proactive multitasking.
Preemptive multitasking is a special task assigned to a computer operating system. It decides how much time one task spends before assigning another task to use the operating system. Because the operating system controls the entire process, it is referred to as 'preemptive'. Preemptive multitasking is used in desktop operating systems.
PCs and Macs had a lot of existing software written for their non-multitasking operating systems, and adding preemptive multitasking without breaking those apps was going to be very difficult. Before multitasking many apps accessed hardware directly and didn't even consider things like shared filesystem access, and of course were not event driven.
Unix was the first operating system to use this method of multitasking. Windows NT and Windows 95 were the first versions of Windows that use preemptive multitasking. With OS X, the Macintosh acquired proactive multitasking. This operating system notifies programs when it's time for another program to take over the CPU.
Round-robin scheduling is an obvious strategy for a kernel. That's however not the way that the Windows scheduler works. It used to, back in the Windows 9x days, a scheduler which was very capable of giving various VMs equal time. But not in the NT branch, started by Dave Cutler's group, scheduling is purely based on priority.
Whatever thread has the highest priority gets the cpu. There's another chunk of code in Windows that tinkers with a thread's priority, modifying it from the default priority it gets when the thread got created. That code is aware of stuff like a thread owning a window that's in the foreground. Or a thread that's waiting for a synchronization object that got signaled. Or the more bizarre scheduling problems that tries to solve a priority inversion problem. Randomly giving a thread a chance to run even though it wasn't its turn.
Focus on writing sane code first. Starting a hundred threads isn't a very sane thing to do. You are trying to consume resources that the machine doesn't actually have available, nobody has a machine with a hundred cores. Yet. Powers of two, get a machine with 128 cores first.
I have reproduced and confirm your results. Additionally, disabling thread priority boost doesn't change the distribution. GetThreadTimes reports that threads with higher Values took more UserTime and vice versa, while KernelTime seems to have no correlation with Values.
Thread 97: 1081,5928 Ke:0 Us:25116161
Thread 98: 1153,8029 Ke:0 Us:26988173
Thread 99: 704,6996 Ke:0 Us:16848108
Clearly, some threads really get to run more often than others.
I haven't graphed the results, but I suppose what we're seeing is a Normal distribution, which means the results depend on a number of factors, some which are random.
I tried disabling hyper-threading (this kinda smoothed the results), then assigning each thread a single physical processor (by using SetThreadAffinityMask). In the second case, Values were much closer to each other.
SetThreadAffinityMask(Self.Handle, 1 shl (FIndex mod 4));
I can sort of understand how running on a hyper-threaded system can make some threads "unlucky": they are scheduled to compete with other threads on the same physical processor, and because of "soft affinity" to this virtual core they get to run on it again and again, thus scoring lower than others.
But as to why binding each thread to a fixed core helps on a non-hyperthreaded system, I don't know.
There are probably other random things involved, such as the activity on the cores by other processes. Thread can get "unlucky" if some other process' thread associated with the same core suddenly wakes up and starts doing some (relatively) heavy work.
All of this is guessing though.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With