Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Understanding VS2010 C# parallel profiling results

I have a program with many independent computations so I decided to parallelize it.

I use Parallel.For/Each.

The results were okay for a dual-core machine - CPU utilization of about 80%-90% most of the time. However, with a dual Xeon machine (i.e. 8 cores) I get only about 30%-40% CPU utilization, although the program spends quite a lot of time (sometimes more than 10 seconds) on the parallel sections, and I see it employs about 20-30 more threads in those sections compared to serial sections. Each thread takes more than 1 second to complete, so I see no reason for them to not work in parallel - unless there is a synchronization problem.

I used the built-in profiler of VS2010, and the results are strange. Even though I use locks only in one place, the profiler reports that about 85% of the program's time is spent on synchronization (also 5-7% sleep, 5-7% execution, under 1% IO).

The locked code is only a cache (a dictionary) get/add:

bool esn_found;
lock (lock_load_esn)
    esn_found = cache.TryGetValue(st, out esn);
if(!esn_found)
{
    esn = pData.esa_inv_idx.esa[term_idx];
    esn.populate(pData.esa_inv_idx.datafile);
    lock (lock_load_esn)
    {
        if (!cache.ContainsKey(st))
            cache.Add(st, esn);
    }
}

lock_load_esn is a static member of the class of type Object.
esn.populate reads from a file using a separate StreamReader for each thread.

However, when I press the Synchronization button to see what causes the most delay, I see that the profiler reports lines which are function entrance lines, and doesn't report the locked sections themselves.
It doesn't even report the function that contains the above code (reminder - the only lock in the program) as part of the blocking profile with noise level 2%. With noise level at 0% it reports all the functions of the program, which I don't understand why they count as blocking synchronizations.

So my question is - what is going on here?
How can it be that 85% of the time is spent on synchronization?
How do I find out what really is the problem with the parallel sections of my program?

Thanks.

Update: After drilling down into the threads (using the extremely useful visualizer) I found out that most of the synchronization time was spent on waiting for the GC thread to complete memory allocations, and that frequent allocations were needed because of generic data structures resize operations.

I'll have to see how to initialize my data structures so that they allocate enough memory on initialization, possibly avoiding this race for the GC thread.

I'll report the results later today.

Update: It appears memory allocations were indeed the cause of the problem. When I used initial capacities for all Dictionaries and Lists in the parallel executed class, the synchronization problem were smaller. I now had only about 80% Synchronization time, with spikes of 70% CPU utilization (previous spikes were only about 40%).

I drilled even further into each thread and discovered that now many calls to GC allocate were made for allocating small objects which were not part of the large dictionaries.

I solved this issue by providing each thread with a pool of preallocated such objects, which I use instead of calling the "new" function.

So I essentially implemented a separate pool of memory for each thread, but in a very crude way, which is very time consuming and actually not very good - I still have to use a lot of new for the initialization of these objects, only now I do it once globally and there is less contention on the GC thread, even when having to increase the size of the pool.

But this is definitely not a solution I like as it is not generalized easily and I wouldn't like to write my own memory manager.
Is there a way to tell .NET to allocate a predefined amount of memory for each thread, and then take all memory allocations from the local pool?

like image 409
Haggai Avatar asked May 25 '10 08:05

Haggai


People also ask

What is Microsoft Visual C used for?

Microsoft Visual C++ is a integrated development environment (IDE) used to create Windows applications in the C, C++, and C++/CLI programming languages.

What is vs2010?

Visual Studio 2010 SP1 enables you to manage Microsoft SQL Server Compact 4.0 SDF files in Solution Explorer and in Server Explorer in the context of web projects. Additionally, Visual Studio 2010 SP1 enables you to use SQL Server Compact 4.0 together with the Microsoft ASP.NET Web Forms in a SQL data source control.

How does C work in VS code?

Download & Install the C/C++ Extension We need to click on the extension button that displays a sidebar for downloading and installing the C/C++ extension in the visual studio code. In the sidebar, type C Extension. In this image, click on the Install button to install the C/C++ extension.

What is Microsoft Visual C++ 2010 redistributable and do I need it?

Microsoft Visual C++ Redistributable is a set of runtime library files used by many programs installed in the PC, even a part of Windows. Uninstalling them will definitely stop all those programs that depend on them from working. In such event, you will need to install the required version again.


1 Answers

Can you allocate less?

I've had a couple similar experiences, looking at bad perf and discovering the heart of the issue was the GC. In each case, though, I discovered that I was accidentally hemorrhaging memory in some inner loop, allocating tons of temporary objects needlessly. I'd give the code a careful look and see if there are allocations you can remove. I think it's rare for programs to 'need' to allocate heavily in inner loops.

like image 57
Brian Avatar answered Oct 02 '22 05:10

Brian