Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

IOCP threads - Clarification?

After reading this article which states :

After a device finishes its job , (IO operation)- it notifies the CPU via interrupt.

... ... ...

However, that “completion” status only exists at the OS level; the process has its own memory space that must be notified

... ... ...

Since the library/BCL is using the standard P/Invoke overlapped I/O system, it has already registered the handle with the I/O Completion Port (IOCP), which is part of the thread pool.

... ... ...

So an I/O thread pool thread is borrowed briefly to execute the APC, which notifies the task that it’s complete.

I was interesting about the bold part :

If I understood correctly , after the the IO operation is finished , it has to notify to the actual process which executed the IO operation.

Question #1:

Does it mean that it grabs a new thread pool thread for each completed IO operation ? Or is it a dedicated number of threads for this ?

Question #2:

Looking at :

for (int i=0;i<1000;i++)
    {
      PingAsync_NOT_AWAITED(i); //notice not awaited !
    }

Does it mean that I'll have 1000 IOCP threadpool thread simultaneously ( sort of) running here , when all are finished ?

like image 979
Royi Namir Avatar asked Feb 24 '15 08:02

Royi Namir


3 Answers

Does it mean that it grabs a new thread pool thread for each completed IO operation ? Or is it a dedicated number of threads for this ?

It would be terribly inefficient to create a new thread for every single I/O request, to the point of defeating the purpose. Instead, the runtime starts off with a small number of threads (the exact number depends on your environment) and adds and removes worker threads as necessary (the exact algorithm for this likewise varies with your environment). Ever major version of .NET has seen changes in this implementation, but the basic idea stays the same: the runtime does its best to create and maintain only as many threads as are necessary to service all I/O efficiently. On my system (Windows 8.1, .NET 4.5.2) a brand new console application has only 3 threads in the process on entering Main, and this number doesn't increase until actual work is requested.

Does it mean that I'll have 1000 IOCP threadpool thread simultaneously ( sort of) running here , when all are finished ?

No. When you issue an I/O request, a thread will be waiting on a completion port to get the result and call whatever callback was registered to handle the result (be it via a BeginXXX method or as the continuation of a task). If you use a task and don't await it, that task simply ends there and the thread is returned to the thread pool.

What if you did await it? The results of 1000 I/O requests won't really arrive all at the same time, since interrupts don't all arrive at the same time, but let's say the interval is much shorter than the time we need to process them. In that case, the thread pool will keep spinning up threads to handle the results until it reaches a maximum, and any further requests will end up queueing on the completion port. Depending on how you configure it, those threads may take some time to spin up.

Consider the following (deliberately awful) toy program:

static void Main(string[] args) {
    printThreadCounts();
    var buffer = new byte[1024];
    const int requestCount = 30;
    int pendingRequestCount = requestCount;
    for (int i = 0; i != requestCount; ++i) {
        var stream = new FileStream(
            @"C:\Windows\win.ini",
            FileMode.Open, FileAccess.Read, FileShare.ReadWrite, 
            buffer.Length, FileOptions.Asynchronous
        );
        stream.BeginRead(
            buffer, 0, buffer.Length,
            delegate {
                Interlocked.Decrement(ref pendingRequestCount);
                Thread.Sleep(Timeout.Infinite);
            }, null
        );
    }
    do {
        printThreadCounts();
        Thread.Sleep(1000);
    } while (Thread.VolatileRead(ref pendingRequestCount) != 0);
    Console.WriteLine(new String('=', 40));
    printThreadCounts();
}

private static void printThreadCounts() {
    int completionPortThreads, maxCompletionPortThreads;
    int workerThreads, maxWorkerThreads;
    ThreadPool.GetMaxThreads(out maxWorkerThreads, out maxCompletionPortThreads);
    ThreadPool.GetAvailableThreads(out workerThreads, out completionPortThreads);
    Console.WriteLine(
        "Worker threads: {0}, Completion port threads: {1}, Total threads: {2}", 
        maxWorkerThreads - workerThreads, 
        maxCompletionPortThreads - completionPortThreads, 
        Process.GetCurrentProcess().Threads.Count
    );
}

On my system (which has 8 logical processors), the output is as follows (results may vary on your system):

Worker threads: 0, Completion port threads: 0, Total threads: 3
Worker threads: 0, Completion port threads: 8, Total threads: 12
Worker threads: 0, Completion port threads: 9, Total threads: 13
Worker threads: 0, Completion port threads: 11, Total threads: 15
Worker threads: 0, Completion port threads: 13, Total threads: 17
Worker threads: 0, Completion port threads: 15, Total threads: 19
Worker threads: 0, Completion port threads: 17, Total threads: 21
Worker threads: 0, Completion port threads: 19, Total threads: 23
Worker threads: 0, Completion port threads: 21, Total threads: 25
Worker threads: 0, Completion port threads: 23, Total threads: 27
Worker threads: 0, Completion port threads: 25, Total threads: 29
Worker threads: 0, Completion port threads: 27, Total threads: 31
Worker threads: 0, Completion port threads: 29, Total threads: 33
========================================
Worker threads: 0, Completion port threads: 30, Total threads: 34

When we issue 30 asynchronous requests, the thread pool quickly makes 8 threads available to handle the results, but after that it only spins up new threads at a leisurely pace of about 2 per second. This demonstrates that if you want to properly utilize system resources, you'd better make sure that your I/O processing completes quickly. Indeed, let's change our delegate to the following, which represents "proper" processing of the request:

stream.BeginRead(
    buffer, 0, buffer.Length,
    ar => {
        stream.EndRead(ar);
        Interlocked.Decrement(ref pendingRequestCount);
    }, null
);

Result:

Worker threads: 0, Completion port threads: 0, Total threads: 3
Worker threads: 0, Completion port threads: 1, Total threads: 11
========================================
Worker threads: 0, Completion port threads: 0, Total threads: 11

Again, results may vary on your system and across runs. Here we barely glimpse the completion port threads in action while the 30 requests we issued are completed without spinning up new threads. You should find that you can change "30" to "100" or even "100000": our loop can't start requests faster than they complete. Note, however, that the results are skewed heavily in our favor because the "I/O" is reading the same bytes over and over and is going to be serviced from the operating system cache and not by reading from a disk. This isn't meant to demonstrate realistic throughput, of course, only the difference in overhead.

To repeat these results with worker threads rather than completion port threads, simply change FileOptions.Asynchronous to FileOptions.None. This makes file access synchronous and the asynchronous operations will be completed on worker threads rather than using the completion port:

Worker threads: 0, Completion port threads: 0, Total threads: 3
Worker threads: 8, Completion port threads: 0, Total threads: 15
Worker threads: 9, Completion port threads: 0, Total threads: 16
Worker threads: 10, Completion port threads: 0, Total threads: 17
Worker threads: 11, Completion port threads: 0, Total threads: 18
Worker threads: 12, Completion port threads: 0, Total threads: 19
Worker threads: 13, Completion port threads: 0, Total threads: 20
Worker threads: 14, Completion port threads: 0, Total threads: 21
Worker threads: 15, Completion port threads: 0, Total threads: 22
Worker threads: 16, Completion port threads: 0, Total threads: 23
Worker threads: 17, Completion port threads: 0, Total threads: 24
Worker threads: 18, Completion port threads: 0, Total threads: 25
Worker threads: 19, Completion port threads: 0, Total threads: 26
Worker threads: 20, Completion port threads: 0, Total threads: 27
Worker threads: 21, Completion port threads: 0, Total threads: 28
Worker threads: 22, Completion port threads: 0, Total threads: 29
Worker threads: 23, Completion port threads: 0, Total threads: 30
Worker threads: 24, Completion port threads: 0, Total threads: 31
Worker threads: 25, Completion port threads: 0, Total threads: 32
Worker threads: 26, Completion port threads: 0, Total threads: 33
Worker threads: 27, Completion port threads: 0, Total threads: 34
Worker threads: 28, Completion port threads: 0, Total threads: 35
Worker threads: 29, Completion port threads: 0, Total threads: 36
========================================
Worker threads: 30, Completion port threads: 0, Total threads: 37

The thread pool spins up one worker thread per second rather than the two it started for completion port threads. Obviously these numbers are implementation-dependent and may change in new releases.

Finally, let's demonstrate the use of ThreadPool.SetMinThreads to ensure a minimum number of threads is available to complete requests. If we go back to FileOptions.Asynchronous and add ThreadPool.SetMinThreads(50, 50) to the Main of our toy program, the result is:

Worker threads: 0, Completion port threads: 0, Total threads: 3
Worker threads: 0, Completion port threads: 31, Total threads: 35
========================================
Worker threads: 0, Completion port threads: 30, Total threads: 35

Now, instead of patiently adding one thread every two seconds, the thread pool keeps spinning up threads until the maximum is reached (which doesn't happen in this case, so the final count stays at 30). Of course, all of these 30 threads are stuck in infinite waits -- but if this had been a real system, those 30 threads would now presumably be doing useful if not terribly efficient work. I wouldn't try this with 100000 requests, though.

like image 132
Jeroen Mostert Avatar answered Sep 28 '22 11:09

Jeroen Mostert


This is a bit broad, so let me just address the major points:

The IOCP threads are on a separate thread pool, so to speak - that's the I/O threads setting. So they do not clash with the user thread-pool threads (like the ones you have in normal await operations or ThreadPool.QueueWorkerItem).

Just like the normal thread pool, it will only allocate new threads slowly over time. So even if there's a peak of async responses that happen all at once, you're not going to have 1000 I/O threads.

In a properly asynchronous application, you're not going to have more than the number of cores, give or take, just like with the worker threads. That's because you're either doing significant CPU work and you shold post it on a normal worker thread or you're doing I/O work and you should do that as an asynchronous operation.

The idea is that you spend very little time in the I/O callback - you don't block, and you don't do a lot of CPU work. If you violate this (say, add Thread.Sleep(10000) to your callback), then yes, .NET will create tons and tons of IO threads over time - but that's just improper usage.

Now, how are I/O threads different from normal CPU threads? They're almost the same, they just wait for a different signal - both are (simplification alert) just a while loop over a method that gives control when a new work item is queued by some other part of the application (or the OS). The main difference is that I/O threads are using IOCP queue (OS managed), while normal worker threads have their own queue, completely .NET managed and accessible by the application programmer.

As a side note, don't forget that your request might have completed synchronously. Perhaps you're reading from a TCP stream in a while loop, 512 bytes at a time. If the socket buffer has enough data in it, multiple ReadAsyncs can return immediately without doing any thread switching at all. This isn't usually a problem because I/O tends to be the most time-intensive stuff you do in a typical application, so not having to wait for I/O is usually fine. However, bad code depending on some part happenning asynchronously (even though that isn't guaranteeed) can easily break your application.

like image 44
Luaan Avatar answered Sep 28 '22 12:09

Luaan


Does it mean that I'll have 1000 IOCP threadpool thread simultaneously ( sort of) running here , when all are finished ?

No, not at all. Same like the worker threads available in ThreadPool we also have "Completion port threads".

These threads are dedicated for Async I/O. There will not be threads created upfront. They are created on demand the sameway as worker threads. They will be destroyed eventually when threadpool decides.

By borrowed briefly author means that to notify the completion of IO to the process some arbitrary thread from "Completion port threads"(of ThreadPool) is used. It will not be executing any lengthy operation but completion of IO notification.

like image 39
Sriram Sakthivel Avatar answered Sep 28 '22 10:09

Sriram Sakthivel