I have the following code, which I would like to speed up:
foreach (var workflow in closedWorkflows)
{
    var w = relatedWorkflows.FirstOrDefault(r => r.Fingerprint != workflow.Fingerprint && r.TradeDate == workflow.TradeDate);
    if (w != null)
    {
        relatedWorkflows.Add(workflow);
    }
}
relatedWorkflows and closedWorkflows are list of same type of objects.
I thought about creating lookup or dictionary with an anonymous class or tuple on Fingerprint and TradeDate, but one check is for equality and other one is for inequality.
Would it be a good approach to create lookups on TradeDate, and then create lookups on Fingerprint  for each lookup list of TradeDate ?
Updated:
A BIG thanks to @Dmitry Bychenko.
In my test the difference is 41486 ms vs 26ms !!!
The main problem with performance is with
var w = relatedWorkflows.FirstOrDefault(...)
we scan relatedWorkflows again and again, that's why we have O(closedWorkflows.Count * relatedWorkflows.Count) time complexity.
We can get rid of relatedWorkflows.Count with the help of hash-based collections (dictionary and hashset), e.g.
var dict = workflow
  .GroupBy(item => item.TradeDate, item => item.FingerPrint)
  .ToDictionary(group => group.Key, group => group.ToHashSet());
foreach (var workflow in closedWorkflows) {
  // If we have a date, but NOT print we add it into relatedWorkflows
  if (dict.TryGetValue(workflow.TradeDate, out var prints) && 
      prints.Add(workflow.Fingerprint)) {
    relatedWorkflows.Add(workflow);
  }
}
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With