Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

parallel linq: AsParallel().forAll() nulls some objects

Tags:

c#

.net

linq

plinq

So, I've got a very weird situation here where it seems a forAll() plinq-query removes some of my custom objects and, to be honest, I have no clue why.

var myArticles = data.FilterCustomerArticles([]params]).ToList(); //always returns 201 articles

result.Articles = new List<ArticleMinimal>();

try
{
    myArticles.AsParallel().ForAll(article =>
                    {
                        result.Articles.Add(new ArticleMinimal()
                        {
                            ArticleNumber = article.ArticleNumber,
                            Description = article.Description,
                            IsMaterial = false,
                            Price = article.PortionPrice.HasValue ? article.PortionPrice.Value : decimal.Zero,
                            Quantity = 1,
                            ValidFrom = new DateTime(1900, 1, 1),
                            ValidTo = new DateTime(2222, 1, 1)
                        });
                    });

}
catch (Exception ex)
{
    ...
}

The Code above returns different result counts nearly every time I call it. It should return 201 ArticleMinimal-Objects. Instead, it returns 200, 189, 19x... and from time to time 201, though. There happens no Exception, nothing. It just returns less objects than it should.

After changing the code to "good ol'" classy foreach-loop, I always get the expected 201 Objects.

Working Code:

var myArticles = data.FilterCustomerArticles([]params]).ToList(); //always returns 201 articles

result.Articles = new List<ArticleMinimal>();

try
{
    foreach (var article in myArticles) { 
        result.Articles.Add(new ArticleMinimal()
                        {
                            ArticleNumber = article.ArticleNumber,
                            Description = article.Description,
                            IsMaterial = false,
                            Price = article.PortionPrice.HasValue ? article.PortionPrice.Value : decimal.Zero,
                            Quantity = 1,
                            ValidFrom = new DateTime(1900, 1, 1),
                            ValidTo = new DateTime(2222, 1, 1)
                        });
    }

}
catch (Exception ex)
{
    ...
}

Additionally, after some more Lines of code, I have another forAll like this:

try
{
    result.Articles.AsParallel().ForAll(article =>
                {
                    if (article.Weight != null){
                        ...
                    }
                });
}
catch (Exception)
{
    ...
}

Using the first forAll, this throws a NullReferenceException - imho, cause it expects some 201 Objects, but some Listentries are null.

Now my actual Question is: Why is it, that the first forAll returns less objects than it should?! Only clue I could think of is the inline declaration of new ArticleMinimal(){ ...}); - but even if that's the cause it seems weird to me. Is it not possible to do this while using plinq? I'm just guessing here.

Hope you could help.

Best regards, Dom

like image 751
Dominik Avatar asked Jan 08 '16 10:01

Dominik


2 Answers

You cannot manipulate result.Articles from many threads as this will likely corrupt the internals, as you are observing.

Instead turn your parallel workflow into a pipeline that returns the created objects:

result.Articles.AddRange(myArticles.AsParallel().Select(article =>
    new ArticleMinimal()
    {
        ArticleNumber = article.ArticleNumber,
        Description = article.Description,
        IsMaterial = false,
        Price = article.PortionPrice.HasValue ? article.PortionPrice.Value : decimal.Zero,
        Quantity = 1,
        ValidFrom = new DateTime(1900, 1, 1),
        ValidTo = new DateTime(2222, 1, 1)
    })
);

The .Select here, since it is being executed on ParallelQuery returned by .AsParallel() will run in parallel on the items.

The .AddRange however will ask for ParallelQuery.GetEnumerator() which will return the items collected into one long collection, giving you what you want.

The fundamental difference is that .AddRange() will likely not add anything until all the parallel tasks have started to complete, whereas your way, if you add the appropriate locking, will add items to the collection as they are being produced. However, unless you want to observe items flowing into the collection as they are being produced this is unlikely to mean anything in your case.

like image 91
Lasse V. Karlsen Avatar answered Nov 02 '22 23:11

Lasse V. Karlsen


List.Add is not thread safe. Please refer to https://stackoverflow.com/a/8796528/98491

Use either lock

lock (result.Articles)
{
    result.Articles.Add(...);
}

or a thread safe collection. I would use a temporary collection and at the end use result.Articles.AddRange(...)

like image 45
Jürgen Steinblock Avatar answered Nov 02 '22 22:11

Jürgen Steinblock