Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Performance optimization of foreach loop in C#

I've got a method:

IList<string> pages = new List<string>();
foreach (var node in nodes)
{
    try
    {
        string temp = DoSomeComplicatedModificationOnNode(node);
        if (temp.ToLower().Contains(path))
        {
            pages.Add(node.Title);
        }
    }
    catch (Exception)
    {
        continue;
    }
}

DoSomeComplicatedModificationOnNode() gives exception in some cases, that's why the try{} catch block is used - I can skip the items which gives exception. The number of nodes contains several thousands of items, an item has several properties. How can I optimize this loop? I was thinking about Parallel.Foreach, but the following code gives me an error "Missing current principal":

IList<string> pages = new List<string>();
Parallel.ForEach(pageNodes, node =>
{
    try
    {
        string temp = DoSomeComplicatedModificationOnNode(node);
        if (temp.ToLower().Contains(path))
        {
            pages.Add(node.Title);
        }
    }
    catch (Exception)
    {
    }
});
like image 212
Steve Macculan Avatar asked Apr 25 '14 09:04

Steve Macculan


2 Answers

In C#, generic list are not thread-safe, so you can not add a items in a parallel loop.

I recommend using another class like ConcurrentBag, ConcurrentStack or ConcurrentQueue.

var pages = new ConcurrentBag<string>();
Parallel.ForEach(pageNodes, node =>
{
    try
    {
        string temp = DoSomeComplicatedModificationOnNode(node);
        if (temp.ToLower().Contains(path))
            pages.Add(node.Title);
    }
    catch (Exception)
    {
        throw;
    }
});

Remember that parallel tasks are disordered, if you want an order you will have to use an index in Parallel. List are only thead-save for reading.

System.Threading.Tasks.Parallel.For(0, pageNodes.Count, index =>
{
    string node = pageNodes[index];

    try
    {
        string temp = DoSomeComplicatedModificationOnNode(node);
        if (temp.ToLower().Contains(path))
            pages.Add(MyPage(index, node.Title));
    }
    catch (Exception)
    {
        throw;
    }
});
like image 79
Pablo Caballero Avatar answered Oct 29 '22 06:10

Pablo Caballero


I would recommend to use PLINQ for such purposes. Parallel LINQ is a parallel implementation of LINQ and has the same set of operations. Code written using PLINQ follows functional style rules - there is no any updates, just mapping of current list in parallel mode. It can increase performance for your case by running a mappers in different threads and then gather result in one single "dataset". Of course it can boost performance only in the case you have CPU with few core (but as usual nowadays we all have few cores).

Here is an example

    private static void Main(string[] args)
    {
        var result =
            GenerateList()
                .AsParallel()
                .Select(MapToString)
                .Where(x => !String.IsNullOrWhiteSpace(x))
                .ToList();

        Console.ReadKey();
    }

    private const string Path = "1";
    private static string MapToString( int node)
    {
        //Console.WriteLine("Thread id: {0}", Thread.CurrentThread.ManagedThreadId);
        try
        {
            string temp = DoSomeComplicatedModificationOnNode(node);
            if (temp.ToLower().Contains(Path))
            {
                return temp;
            }
        }
        catch (Exception)
        {
            return null;
        }

        return null;
    }
    private static IEnumerable<int> GenerateList()
    {
        for (var i=0; i <= 10000; i++)
            yield return i;
    }

    private static string DoSomeComplicatedModificationOnNode(int node)
    {
        return node.ToString(CultureInfo.InvariantCulture);
    }
like image 43
Igor Tkachenko Avatar answered Oct 29 '22 08:10

Igor Tkachenko