Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Multithreaded program becomes nonresponsive on _multiple_ CPUs, but fine on a single CPU (when updating a ListView)

Update: I've reproduced the problem! Scroll lower to see the code.

Quick Notes

  • My Core i5 CPU has 2 cores, hyperthreading.

  • If I call SetProcessAffinityMask(GetCurrentProcess(), 1), everything is fine, even though the program is still multithreaded.

  • If I don't do that, and the program is running on Windows XP (it's fine on Windows 7 x64!), my GUI starts locking up for several seconds while I'm scrolling the list view and the icons are loading.

The Problem

Basically, when I run the program posted below (a reduced version of my original code) on Windows XP (Windows 7 is fine), unless I force the same logical CPU for all my threads, the program UI starts lagging behind by half a second or so.

(Note: Lots of edits to this post here, as I investigated the problem further.)

Note that the number of threads is the same -- only the affinity mask is different.

I've tried this out using two different methods of message-passing: the built-in GetMessage as well as my own BackgroundWorker.

The result? BackgroundWorker benefits from affinity for 1 logical CPU (virtually no lag), whereas GetMessage is completely hurt by this, (lag is now many seconds long).

I can't figure out why that would be happening -- shouldn't multiple CPUs work better than a single CPU?!
Why would there be such a lag, when the number of threads is the same?


More stats:

GetLogicalProcessorInformation returns:

0x0: {ProcessorMask=0x0000000000000003 Relationship=RelationProcessorCore ...}
0x1: {ProcessorMask=0x0000000000000003 Relationship=RelationCache ...}
0x2: {ProcessorMask=0x0000000000000003 Relationship=RelationCache ...}
0x3: {ProcessorMask=0x0000000000000003 Relationship=RelationCache ...}
0x4: {ProcessorMask=0x000000000000000f Relationship=RelationProcessorPackage ...}
0x5: {ProcessorMask=0x000000000000000c Relationship=RelationProcessorCore ...}
0x6: {ProcessorMask=0x000000000000000c Relationship=RelationCache ...}
0x7: {ProcessorMask=0x000000000000000c Relationship=RelationCache ...}
0x8: {ProcessorMask=0x000000000000000c Relationship=RelationCache ...}
0x9: {ProcessorMask=0x000000000000000f Relationship=RelationCache ...}
0xa: {ProcessorMask=0x000000000000000f Relationship=RelationNumaNode ...}

The Code

The code below should shows this problem on Windows XP SP3. (At least, it does on my computer!)

Compare these two:

  • Run the program normally, then scroll. You should see lag.

  • Run the program with the affinity command-line argument, then scroll. It should be almost completely smooth.

Why would this happen?

#define _WIN32_WINNT 0x502

#include <tchar.h>
#include <Windows.h>
#include <CommCtrl.h>

#pragma comment(lib, "kernel32.lib")
#pragma comment(lib, "comctl32.lib")
#pragma comment(lib, "user32.lib")

LONGLONG startTick = 0;

LONGLONG QPC()
{ LARGE_INTEGER v; QueryPerformanceCounter(&v); return v.QuadPart; }

LONGLONG QPF()
{ LARGE_INTEGER v; QueryPerformanceFrequency(&v); return v.QuadPart; }

bool logging = false;
bool const useWindowMessaging = true;   // GetMessage() or BackgroundWorker?
bool const autoScroll = false;   // for testing

class BackgroundWorker
{
    struct Thunk
    {
        virtual void operator()() = 0;
        virtual ~Thunk() { }
    };
    class CSLock
    {
        CRITICAL_SECTION& cs;
    public:
        CSLock(CRITICAL_SECTION& criticalSection)
            : cs(criticalSection)
        { EnterCriticalSection(&this->cs); }
        ~CSLock() { LeaveCriticalSection(&this->cs); }
    };
    template<typename T>
    class ScopedPtr
    {
        T *p;
        ScopedPtr(ScopedPtr const &) { }
        ScopedPtr &operator =(ScopedPtr const &) { }
    public:
        ScopedPtr() : p(NULL) { }
        explicit ScopedPtr(T *p) : p(p) { }
        ~ScopedPtr() { delete p; }
        T *operator ->() { return p; }
        T &operator *() { return *p; }
        ScopedPtr &operator =(T *p)
        {
            if (this->p != NULL) { __debugbreak(); }
            this->p = p;
            return *this;
        }
        operator T *const &() { return this->p; }
    };

    Thunk **const todo;
    size_t nToDo;
    CRITICAL_SECTION criticalSection;
    DWORD tid;
    HANDLE hThread, hSemaphore;
    volatile bool stop;
    static size_t const MAX_TASKS = 1 << 18;  // big enough for testing

    static DWORD CALLBACK entry(void *arg)
    { return ((BackgroundWorker *)arg)->process(); }

public:
    BackgroundWorker()
        : nToDo(0), todo(new Thunk *[MAX_TASKS]), stop(false), tid(0),
        hSemaphore(CreateSemaphore(NULL, 0, 1 << 30, NULL)),
        hThread(CreateThread(NULL, 0, entry, this, CREATE_SUSPENDED, &tid))
    {
        InitializeCriticalSection(&this->criticalSection);
        ResumeThread(this->hThread);
    }

    ~BackgroundWorker()
    {
        // Clear all the tasks
        this->stop = true;
        this->clear();
        LONG prev;
        if (!ReleaseSemaphore(this->hSemaphore, 1, &prev) ||
            WaitForSingleObject(this->hThread, INFINITE) != WAIT_OBJECT_0)
        { __debugbreak(); }
        CloseHandle(this->hSemaphore);
        CloseHandle(this->hThread);
        DeleteCriticalSection(&this->criticalSection);
        delete [] this->todo;
    }

    void clear()
    {
        CSLock lock(this->criticalSection);
        while (this->nToDo > 0)
        {
            delete this->todo[--this->nToDo];
        }
    }

    unsigned int process()
    {
        DWORD result;
        while ((result = WaitForSingleObject(this->hSemaphore, INFINITE))
            == WAIT_OBJECT_0)
        {
            if (this->stop) { result = ERROR_CANCELLED; break; }
            ScopedPtr<Thunk> next;
            {
                CSLock lock(this->criticalSection);
                if (this->nToDo > 0)
                {
                    next = this->todo[--this->nToDo];
                    this->todo[this->nToDo] = NULL;  // for debugging
                }
            }
            if (next) { (*next)(); }
        }
        return result;
    }

    template<typename Func>
    void add(Func const &func)
    {
        CSLock lock(this->criticalSection);
        struct FThunk : public virtual Thunk
        {
            Func func;
            FThunk(Func const &func) : func(func) { }
            void operator()() { this->func(); }
        };
        DWORD exitCode;
        if (GetExitCodeThread(this->hThread, &exitCode) &&
            exitCode == STILL_ACTIVE)
        {
            if (this->nToDo >= MAX_TASKS) { __debugbreak(); /*too many*/ }
            if (this->todo[this->nToDo] != NULL) { __debugbreak(); }
            this->todo[this->nToDo++] = new FThunk(func);
            LONG prev;
            if (!ReleaseSemaphore(this->hSemaphore, 1, &prev))
            { __debugbreak(); }
        }
        else { __debugbreak(); }
    }
};

LRESULT CALLBACK MyWindowProc(
    HWND hWnd, UINT uMsg, WPARAM wParam, LPARAM lParam)
{
    enum { IDC_LISTVIEW = 101 };
    switch (uMsg)
    {
        case WM_CREATE:
        {
            RECT rc; GetClientRect(hWnd, &rc);

            HWND const hWndListView = CreateWindowEx(
                WS_EX_CLIENTEDGE, WC_LISTVIEW, NULL,
                WS_CHILDWINDOW | WS_VISIBLE | LVS_REPORT |
                LVS_SHOWSELALWAYS | LVS_SINGLESEL | WS_TABSTOP,
                rc.left, rc.top, rc.right - rc.left, rc.bottom - rc.top,
                hWnd, (HMENU)IDC_LISTVIEW, NULL, NULL);

            int const cx = GetSystemMetrics(SM_CXSMICON),
                cy = GetSystemMetrics(SM_CYSMICON);

            HIMAGELIST const hImgList =
                ImageList_Create(
                    GetSystemMetrics(SM_CXSMICON),
                    GetSystemMetrics(SM_CYSMICON),
                    ILC_COLOR32, 1024, 1024);

            ImageList_AddIcon(hImgList, (HICON)LoadImage(
                NULL, IDI_INFORMATION, IMAGE_ICON, cx, cy, LR_SHARED));

            LVCOLUMN col = { LVCF_TEXT | LVCF_WIDTH, 0, 500, TEXT("Name") };
            ListView_InsertColumn(hWndListView, 0, &col);
            ListView_SetExtendedListViewStyle(hWndListView,
                LVS_EX_DOUBLEBUFFER | LVS_EX_FULLROWSELECT | LVS_EX_GRIDLINES);
            ListView_SetImageList(hWndListView, hImgList, LVSIL_SMALL);

            for (int i = 0; i < (1 << 11); i++)
            {
                TCHAR text[128]; _stprintf(text, _T("Item %d"), i);
                LVITEM item =
                {
                    LVIF_IMAGE | LVIF_TEXT, i, 0, 0, 0,
                    text, 0, I_IMAGECALLBACK
                };
                ListView_InsertItem(hWndListView, &item);
            }

            if (autoScroll)
            {
                SetTimer(hWnd, 0, 1, NULL);
            }

            break;
        }
        case WM_TIMER:
        {
            HWND const hWndListView = GetDlgItem(hWnd, IDC_LISTVIEW);
            RECT rc; GetClientRect(hWndListView, &rc);
            if (!ListView_Scroll(hWndListView, 0, rc.bottom - rc.top))
            {
                KillTimer(hWnd, 0);
            }
            break;
        }
        case WM_NULL:
        {
            HWND const hWndListView = GetDlgItem(hWnd, IDC_LISTVIEW);
            int const iItem = (int)lParam;
            if (logging)
            {
                _tprintf(_T("@%I64lld ms:")
                    _T(" Received: #%d\n"),
                    (QPC() - startTick) * 1000 / QPF(), iItem);
            }
            int const iImage = 0;
            LVITEM const item = {LVIF_IMAGE, iItem, 0, 0, 0, NULL, 0, iImage};
            ListView_SetItem(hWndListView, &item);
            ListView_Update(hWndListView, iItem);
            break;
        }
        case WM_NOTIFY:
        {
            LPNMHDR const pNMHDR = (LPNMHDR)lParam;
            switch (pNMHDR->code)
            {
            case LVN_GETDISPINFO:
                {
                    NMLVDISPINFO *const pInfo = (NMLVDISPINFO *)lParam;
                    struct Callback
                    {
                        HWND hWnd;
                        int iItem;
                        void operator()()
                        {
                            if (logging)
                            {
                                _tprintf(_T("@%I64lld ms: Sent:     #%d\n"),
                                    (QPC() - startTick) * 1000 / QPF(),
                                    iItem);
                            }
                            PostMessage(hWnd, WM_NULL, 0, iItem);
                        }
                    };
                    if (pInfo->item.iImage == I_IMAGECALLBACK)
                    {
                        if (useWindowMessaging)
                        {
                            DWORD const tid =
                                (DWORD)GetWindowLongPtr(hWnd, GWLP_USERDATA);
                            PostThreadMessage(
                                tid, WM_NULL, 0, pInfo->item.iItem);
                        }
                        else
                        {
                            Callback callback = { hWnd, pInfo->item.iItem };
                            if (logging)
                            {
                                _tprintf(_T("@%I64lld ms: Queued:   #%d\n"),
                                    (QPC() - startTick) * 1000 / QPF(),
                                    pInfo->item.iItem);
                            }
                            ((BackgroundWorker *)
                             GetWindowLongPtr(hWnd, GWLP_USERDATA))
                                ->add(callback);
                        }
                    }
                    break;
                }
            }
            break;
        }
        
        case WM_CLOSE:
        {
            PostQuitMessage(0);
            break;
        }
    }
    return DefWindowProc(hWnd, uMsg, wParam, lParam);
}

DWORD WINAPI BackgroundWorkerThread(LPVOID lpParameter)
{
    HWND const hWnd = (HWND)lpParameter;
    MSG msg;
    while (GetMessage(&msg, NULL, 0, 0) > 0 && msg.message != WM_QUIT)
    {
        if (msg.message == WM_NULL)
        {
            PostMessage(hWnd, msg.message, msg.wParam, msg.lParam);
        }
    }
    return 0;
}

int _tmain(int argc, LPTSTR argv[])
{
    startTick = QPC();
    bool const affinity = argc >= 2 && _tcsicmp(argv[1], _T("affinity")) == 0;
    if (affinity)
    { SetProcessAffinityMask(GetCurrentProcess(), 1 << 0); }

    bool const log = logging;  // disable temporarily
    logging = false;

    WNDCLASS wndClass =
    {
        0, &MyWindowProc, 0, 0, NULL, NULL, LoadCursor(NULL, IDC_ARROW),
        GetSysColorBrush(COLOR_3DFACE), NULL, TEXT("MyClass")
    };
    HWND const hWnd = CreateWindow(
        MAKEINTATOM(RegisterClass(&wndClass)),
        affinity ? TEXT("Window (1 CPU)") : TEXT("Window (All CPUs)"),
        WS_OVERLAPPEDWINDOW | WS_VISIBLE, CW_USEDEFAULT, CW_USEDEFAULT,
        CW_USEDEFAULT, CW_USEDEFAULT, NULL, NULL, NULL, NULL);

    BackgroundWorker iconLoader;
    DWORD tid = 0;
    if (useWindowMessaging)
    {
        CreateThread(NULL, 0, &BackgroundWorkerThread, (LPVOID)hWnd, 0, &tid);
        SetWindowLongPtr(hWnd, GWLP_USERDATA, tid);
    }
    else { SetWindowLongPtr(hWnd, GWLP_USERDATA, (LONG_PTR)&iconLoader); }
    
    MSG msg;
    while (GetMessage(&msg, NULL, 0, 0) > 0)
    {
        if (!IsDialogMessage(hWnd, &msg))
        {
            TranslateMessage(&msg);
            DispatchMessage(&msg);
        }

        if (msg.message == WM_TIMER ||
            !PeekMessage(&msg, NULL, 0, 0, PM_NOREMOVE))
        { logging = log; }
    }

    PostThreadMessage(tid, WM_QUIT, 0, 0);
    return 0;
}
like image 291
user541686 Avatar asked Jul 04 '12 22:07

user541686


2 Answers

Based on the inter-thread timings you posted at http://ideone.com/fa2fM, it looks like there is a fairness issue at play here. Based solely on this assumption, here is my reasoning as to the apparent cause of the perceived lag and a potential solution to the problem.

It looks like there is a large number of LVN_GETDISPINFO messages being generated and processed on one thread by the window proc, and while the background worker thread is able to keep up and post messages back to the window at the same rate, the WM_NULL messages it posts are so far back in the queue that it takes time before they get handled.

When you set the processor affinity mask, you introduce more fairness into the system because the same processor must service both threads, which will limit the rate at which LVN_GETDISPINFO messages are generated relative to the non-affinity case. This means that the window proc message queue is likely not as deep when you post your WM_NULL messages, which in turn means that they will be processed 'sooner'.

It seems that you need to somehow bypass the queueing effect. Using SendMessage, SendMessageCallback or SendNotifyMessage instead of PostMessage may be ways to do this. In the SendMessage case, your worker thread will block until the window proc thread is finished its current message and processes the sent WM_NULL message, but you will be able to inject your WM_NULL messages more evenly into the message processing flow. See this page for an explanation of queued vs. non-queued message handling.

If you choose to use SendMessage, but you don't want to limit the rate at which you can obtain icons due to the blocking nature of SendMessage, then you can use a third thread. Your I/O thread would post messages to the third thread, while the third thread uses SendMessage to inject icon updates into the UI thread. In this fashion, you have control of the queue of satisfied icon requests, instead of interleaving them into the window proc message queue.

As for the difference in behaviour between Win7 and WinXP, there may be a number of reasons why you don't appear to see this effect on Win7. It could be that the list view common control is implemented differently and limits the rate at which LVN_GETDISPINFO messages are generated. Or perhaps the thread scheduling mechanism in Win7 switches thread contexts more frequently or more fairly.

EDIT:

Based on your latest change, try the following:

...

                struct Callback 
                { 
                    HWND hWnd; 
                    int iItem; 
                    void operator()() 
                    { 
                        if (logging) 
                        { 
                            _tprintf(_T("@%I64lld ms: Sent:     #%d\n"), 
                                (QPC() - startTick) * 1000 / QPF(), 
                                iItem); 
                        } 
                        SendNotifyMessage(hWnd, WM_NULL, 0, iItem); // <----
                    } 
                }; 


...

DWORD WINAPI BackgroundWorkerThread(LPVOID lpParameter) 
{ 
    HWND const hWnd = (HWND)lpParameter; 
    MSG msg; 
    while (GetMessage(&msg, NULL, 0, 0) > 0 && msg.message != WM_QUIT) 
    { 
        if (msg.message == WM_NULL) 
        { 
            SendNotifyMessage(hWnd, msg.message, msg.wParam, msg.lParam); // <----
        } 
    } 
    return 0; 
} 

EDIT 2:

After establishing that the LVN_GETDISPINFO message are being placed in the queue using SendMessage instead of PostMessage, we can't use SendMessage ourselves to bypass them.

Still proceeding on the assumption that there is a glut of messages being processed by the wndproc before the icon results are being sent back from the worker thread, we need another way to get those updates handled as soon as they are ready.

Here's the idea:

  1. Worker thread places results in a synchronized queue-like data structure, and then posts (using PostMessage) a WM_NULL message to the wndproc (to ensure that the wndproc gets executed sometime in the future).

  2. At the top of the wndproc (before the case statements), the UI thread checks the synchronized queue-like data structure to see if there are any results, and if so, removes one or more results from the queue-like data structure and processes them.

like image 125
Monroe Thomas Avatar answered Nov 30 '22 05:11

Monroe Thomas


The issue has less to do with thread affinity and more to do with telling the listview that it needs to update the list item every time you update it. Because you do not add the LVIF_DI_SETITEM flag to pInfo->item.mask in your LVN_GETDISPINFO handler, and because you call ListView_Update manually, when you call ListView_Update, the list view invalidates any item that still has its iImage set to I_IMAGECALLBACK.

You can fix this in one of two ways (or a combination of both):

  1. Remove ListView_Update from your WM_NULL handler. The list view will automatically redraw the items you set the image for in your WM_NULL handler when you set them, and it will not attempt to redraw items you haven't set the image for more than once.

  2. Set LVIF_DI_SETITEM flag in pInfo->item.mask in your LVN_GETDISPINFO handler and set pInfo->item.iImage to a value that is not I_IMAGECALLBACK.

I repro'd similar awful behavior doing a full page scroll on Vista. Doing either of the above fixed the issue while still updating the icons asynchronously.

like image 34
MSN Avatar answered Nov 30 '22 04:11

MSN