Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find objects with same properties in large List - performance slow

Tags:

c#

list

I have a large List<MyClass> of objects, around 600000. MyClass has like 10 properties, let's say property1, property2, etc.. until property10.

Out of that list, I want to get a List of List<MyClass> with objects having the same value for some of the properties.

That means for example, objects where property2, property4, property8 and property10 are the same.

What is the best way to do that? Currently I do a loop over my List<MyClass>, and within that loop I get all similar objects via List<MyClass>.FindAll(), dummy code:

forach(var item in myClassList)
{
   if(!found.Contains(item))
   {
      var similarObjects = myClassList.FindAll(x => x.property2 == item.property2 && x.property4 == item.property4 && x.property8 == item.property8 && x.property10 == item.property10);

      //adding the objects to the "already found" list
      foreach(var foundItem in similarOjbects)
      {
         found.Add(foundItem);
      }

     if(similarObjects.Count > 1)
     {
        similarObjectsList.Add(similarObjects);
     }
   }
}

But it takes ages, the List.FindAll() method is very slow.

Is there a more efficient algorithm to do that?

like image 980
flo Avatar asked Mar 11 '23 18:03

flo


1 Answers

You can use group by to solve this quite efficiently:

var grouped =
    from item in myClassList
    group item 
    by new {item.Property2, item.Property4, item.Property8, item.Property10};

That will give you a sequence of groups where each group contains all the objects that have the same values for the specified properties.

As an example, to iterate over every item in each group of the resulting sequence of groups you could do something like this:

foreach (var group in grouped)
{
    foreach (var item in group)
    {
        // Do something with item
    }
}

Note that this assumes that the type of each property implements IEquatable<T> and GetHashCode().

Here's a compilable example:

using System;
using System.Collections.Generic;
using System.Linq;

namespace Demo
{
    class Data
    {
        public string Name { get; set; }
        public int Property1  { get; set; }
        public int Property2  { get; set; }
        public int Property3  { get; set; }
        public int Property4  { get; set; }
        public int Property5  { get; set; }
        public int Property6  { get; set; }
        public int Property7  { get; set; }
        public int Property8  { get; set; }
        public int Property9  { get; set; }
        public int Property10 { get; set; }
    }

    class Program
    {
        static void Main(string[] args)
        {
            List<Data> myClassList = new List<Data>
            {
                new Data {Name = "1A", Property2 = 1, Property4 = 1, Property8 = 1, Property10 = 1},
                new Data {Name = "1B", Property2 = 1, Property4 = 1, Property8 = 1, Property10 = 1},
                new Data {Name = "1C", Property2 = 1, Property4 = 1, Property8 = 1, Property10 = 1},
                new Data {Name = "2A", Property2 = 2, Property4 = 2, Property8 = 2, Property10 = 2},
                new Data {Name = "2B", Property2 = 2, Property4 = 2, Property8 = 2, Property10 = 2},
                new Data {Name = "2C", Property2 = 2, Property4 = 2, Property8 = 2, Property10 = 2},
                new Data {Name = "3A", Property2 = 3, Property4 = 3, Property8 = 3, Property10 = 3},
                new Data {Name = "3B", Property2 = 3, Property4 = 3, Property8 = 3, Property10 = 3},
                new Data {Name = "3C", Property2 = 3, Property4 = 3, Property8 = 3, Property10 = 3},
            };

            var grouped =
                from item in myClassList
                group item 
                by new {item.Property2, item.Property4, item.Property8, item.Property10};

            foreach (var group in grouped)
            {
                Console.WriteLine(string.Join(", ", group.Select(item => item.Name)));
            }
        }
    }
}

The example above outputs:

1A, 1B, 1C
2A, 2B, 2C
3A, 3B, 3C

Possible optimisation using PLINQ

As mentioned by @BertPersyn below, you could perhaps speed this up using PLINQ.

To do that, simply use the following to generate grouped (note the addition of .AsParallel()):

var grouped = 
    from item in myClassList.AsParallel()
    group item 
    by new {item.Property2, item.Property4, item.Property8, item.Property10};

To determine if this actually speeds things up, it is imperative that you perform some timings.

like image 77
Matthew Watson Avatar answered May 06 '23 10:05

Matthew Watson