I have a large List<MyClass>
of objects, around 600000.
MyClass
has like 10 properties, let's say property1
, property2
, etc.. until property10
.
Out of that list, I want to get a List of List<MyClass>
with objects having the same value for some of the properties.
That means for example, objects where property2
, property4
, property8
and property10
are the same.
What is the best way to do that? Currently I do a loop over my List<MyClass>
, and within that loop I get all similar objects via List<MyClass>.FindAll()
, dummy code:
forach(var item in myClassList)
{
if(!found.Contains(item))
{
var similarObjects = myClassList.FindAll(x => x.property2 == item.property2 && x.property4 == item.property4 && x.property8 == item.property8 && x.property10 == item.property10);
//adding the objects to the "already found" list
foreach(var foundItem in similarOjbects)
{
found.Add(foundItem);
}
if(similarObjects.Count > 1)
{
similarObjectsList.Add(similarObjects);
}
}
}
But it takes ages, the List.FindAll()
method is very slow.
Is there a more efficient algorithm to do that?
You can use group by
to solve this quite efficiently:
var grouped =
from item in myClassList
group item
by new {item.Property2, item.Property4, item.Property8, item.Property10};
That will give you a sequence of groups where each group contains all the objects that have the same values for the specified properties.
As an example, to iterate over every item in each group of the resulting sequence of groups you could do something like this:
foreach (var group in grouped)
{
foreach (var item in group)
{
// Do something with item
}
}
Note that this assumes that the type of each property implements IEquatable<T>
and GetHashCode()
.
Here's a compilable example:
using System;
using System.Collections.Generic;
using System.Linq;
namespace Demo
{
class Data
{
public string Name { get; set; }
public int Property1 { get; set; }
public int Property2 { get; set; }
public int Property3 { get; set; }
public int Property4 { get; set; }
public int Property5 { get; set; }
public int Property6 { get; set; }
public int Property7 { get; set; }
public int Property8 { get; set; }
public int Property9 { get; set; }
public int Property10 { get; set; }
}
class Program
{
static void Main(string[] args)
{
List<Data> myClassList = new List<Data>
{
new Data {Name = "1A", Property2 = 1, Property4 = 1, Property8 = 1, Property10 = 1},
new Data {Name = "1B", Property2 = 1, Property4 = 1, Property8 = 1, Property10 = 1},
new Data {Name = "1C", Property2 = 1, Property4 = 1, Property8 = 1, Property10 = 1},
new Data {Name = "2A", Property2 = 2, Property4 = 2, Property8 = 2, Property10 = 2},
new Data {Name = "2B", Property2 = 2, Property4 = 2, Property8 = 2, Property10 = 2},
new Data {Name = "2C", Property2 = 2, Property4 = 2, Property8 = 2, Property10 = 2},
new Data {Name = "3A", Property2 = 3, Property4 = 3, Property8 = 3, Property10 = 3},
new Data {Name = "3B", Property2 = 3, Property4 = 3, Property8 = 3, Property10 = 3},
new Data {Name = "3C", Property2 = 3, Property4 = 3, Property8 = 3, Property10 = 3},
};
var grouped =
from item in myClassList
group item
by new {item.Property2, item.Property4, item.Property8, item.Property10};
foreach (var group in grouped)
{
Console.WriteLine(string.Join(", ", group.Select(item => item.Name)));
}
}
}
}
The example above outputs:
1A, 1B, 1C
2A, 2B, 2C
3A, 3B, 3C
Possible optimisation using PLINQ
As mentioned by @BertPersyn below, you could perhaps speed this up using PLINQ.
To do that, simply use the following to generate grouped
(note the addition of .AsParallel()
):
var grouped =
from item in myClassList.AsParallel()
group item
by new {item.Property2, item.Property4, item.Property8, item.Property10};
To determine if this actually speeds things up, it is imperative that you perform some timings.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With