Could you tell me which one of those list joining methods is more efficient and why? Or is it the same performance-wise? Is there any other approach to this situation (joining 2 lists based on field property)?
My code:
public List<CombinedDatabaseStructure> Combine1(List<DatabaseStructure1> _Data1, List<DatabaseStructure2> _Data2)
{
List<CombinedDatabaseStructure> combinedAll = new List<CombinedDatabaseStructure>();
foreach (var row1 in _Data1)
{
foreach (var row2 in _Data2)
{
CombinedDatabaseStructure combined = new CombinedDatabaseStructure();
if (row1.ID == row2.ID)
{
combined.DatabaseStructure1 = row1;
combined.DatabaseStructure2 = row2;
combinedAll.Add(combined);
}
}
}
return combinedAll;
}
Code2:
public List<CombinedDatabaseStructure> Combine2(List<DatabaseStructure1> _Data1, List<DatabaseStructure2> _Data2)
{
var joined = from item1 in _Data1.AsEnumerable()
join item2 in _Data2.AsEnumerable() on item1.ID equals item2.ID
select new CombinedDatabaseStructure (item1,item2);
return joined.ToList<CombinedDatabaseStructure>();
}
As a general rule: If there's a built-in method in the .NET framework which does exactly what you want, it's usually a good idea to use it instead of re-implementing it. It's easier, more readable, less error-prone, better tested and usually more efficiently implemented.
Let's look at your particular problem in detail:
Option 1 is basically a manual (naive) implementation of a nested loop join with a complexity of O(n*m).
Option 2 uses LINQ-to-object's implementation of join, which internally uses a hash join, which has a complexity of O(n+m).
If you are worried about efficiency, I'd recommend "Option 3": let the database do the join. It can use statistics to choose the optimal join strategy for your data.
Note: Your nested loop implementation is very inefficient. It can be implemented with O(n*log(m)) complexity by using some kind of index lookup to find matching Data2 rows instead of the inner loop. In that case, a nested loop join can be faster than a hash join, if n is very small and m is large. This, however, assumes that the index already exists, since creating an index (for example, by creating a C# dictionary from your list) is a O(m) operation itself.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With