Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parallel with Entity Framework. Performance is dramatic compared to launching multiple executables, why?

I am wondering if any of you guys know why my performance is terrible;

What I am trying to achieve; Generate 2.2 million files. To create each file, 2-5 databasecalls are needed on average.

The server I am working on has 24 cores and 190GB of RAM.

I divided the files I need to generate in 24 batches.

Whey I use following code, I get lousy performance. The generation process takes over an hour.

Parrallel.ForEach(batches, batch => 
{
    using (var ctx = new MyContext())
    {
        for each(var file in batch.Files)
        {
            GenerateFile(file);
        }
    }
});

However, when I make sure that my program receives a parameter so the progam knows which batch to generate so I don't need to use the parallel functionality. If I execute the program for each batch with the following .bat file;

START CaMaakEiBericht.exe \B1
START CaMaakEiBericht.exe \B2
...
START CaMaakEiBericht.exe \B24

It runs amazingly fast! The total generation process takes less than 15 minutes! This batch file also makes sure that each core has a cpu usage around 90%. When I use the Parallel approach, I only get 30-40% usage.

Does someone have a logical explanation for this? I was pleased with this project because I finally had the possibility to use the .NET 4 Parallel library in combination with EF but unfortunately, it kinda disappointed me :-)

I personally have a slight suspision that EF is the bottleneck here... Does it cache some stuff internally which imposes some locks when multiple processes are fetching data?

Enlighten me :-)

like image 778
Ben Thaens Avatar asked Feb 10 '12 14:02

Ben Thaens


1 Answers

I can't speak as to why your other EXE file works well, but I can offer a suggestion for the code that you present.

You mentioned that you split your work up into 24 batches, then you used ForEach over the list of batches. With this setup, it would seem that each of our 24 cores can be working on 1 file at a time. My guess is that is your bottleneck.

Each core could be doing a lot more if you let it. Try something like this:

Parallel.ForEach(batches, batch => 
{
    Parallel.ForEach(batch.Files, file =>
    {
        using (var ctx = new MyContext())
        {
            GenerateFile(file);
        }     
    }
});

Or you could just get rid of the batches entirely and give it the full list of files. The task parallel library will take care of using multiple cores for you.

Parallel.ForEach(Files, file => 
{
    using (var ctx = new MyContext())
    {
        GenerateFile(file);
    }     
});

You probably already know this, but keep in mind that the context is not thread safe, so you have to create a new one inside the inner-most Parallel.ForEach structure.

like image 83
CodeThug Avatar answered Nov 15 '22 18:11

CodeThug