Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using IEnumerable.Except on KeyCollection vs. exploiting Dictionary.ContainsKey for mutual subtractions and intersection in relation to performance

I have two dictionaries Dictionary<string, object>. I need to find their intersection (I mean only their keys intersection) and A\B and B\A subtractions and make some actions to the objects (in fact my objects are EntityFramework entities and I have to mark their state as Modified, Added and Deleted respectively, though it's not very relevant to the question). Just imagine the simplest Venn diagram.

I want to do it the most efficient way. I think I have two choices:

1) Implement a set of generic extension methods that internally operate IEnumerable methods on KeyCollection like ExceptByKey, for example:

public static Dictionary<TKey, TValue> ExceptByKeys<TKey, TValue>(this Dictionary<TKey, TValue> dict1, Dictionary<TKey, TValue> dict2)
{
    return dict1.Keys.Except(dict2.Keys).ToDictionary(key => key, key => dict1[key]);
}

Then I could operate these methods to separately process each of three groups. From there I know that KeyCollection.Contains method internally uses the Dictionary<TKey, TValue>.ContainsKey method so both are O(1). So my Except method would then run in O(n), is that right? I would need to use it once for each dictionary and also somehow detect the intersected part, which can be done implicitly though by first iterating over all entities in one dictionary and marking them as belonging to the intersection. So, is it like O(n) + O(n + m)?

2) I could also iterate over my dictionaries calling ContainsKey method on the other dictionary for each element and doing the appropriate thing. This seems to me a better solution because I only get O(n + m) complexity.

So, the questions are: - am I right in my calculations? - is there a better way that I have not thought about to accomplish what I want?

UPDATE 19/06/2015

So I've chosen the second case and it works OK. Here's my implementation in the wild

using (var he = new HostEntities())
{
    var dbHardDrives = he.HardDrive.Where(_ => _.HostName == _address).ToDictionary(_ => _.Name, _ => _);
    foreach (var dbHd in dbHardDrives)
    {
        if (wmiHardDrives.ContainsKey(dbHd.Key))
        {
            he.Entry(dbHd.Value).State = EntityState.Detached;
            he.Entry(wmiHardDrives[dbHd.Key]).State = EntityState.Modified;
        }
        else
        {
            he.Entry(dbHd.Value).State = EntityState.Deleted;
        }
    }
    foreach (var wmiHd in wmiHardDrives)
    {
        if (!dbHardDrives.ContainsKey(wmiHd.Key))
        {
            he.Entry(wmiHd.Value).State = EntityState.Added;
        }
    }
    he.SaveChanges();
}
like image 325
Varvara Kalinina Avatar asked Nov 17 '25 17:11

Varvara Kalinina


1 Answers

Your reasoning looks sound to me. LINQs Except() iterates over the second collection putting it into a HashSet Set before iterating over the first collection, performing lookups against the Set - it is O(n + m). Your extension method is therefore O(n + m) too. As you mention, if you want to calculate the 3 sets of additions, deletions and intersections, you would have to call it multiple times, making option 2 more preferable.

You are trying to do an outer join, and be able to evaluate the left, inner and right items separately. For an O(n + m) solution you could use something like this

public static JoinResult<TKey> JoinKeys<TKey, TValue>(this IDictionary<TKey, TValue> first, IDictionary<TKey, TValue> second)
{
    var left = new List<TKey>();
    var inner = new HashSet<TKey>();    // HashSet to optimize lookups
    var right = new List<TKey>();

    foreach (var l in first.Keys)   // O(n)
    {
        if (second.ContainsKey(l))
            inner.Add(l);
        else
            left.Add(l);
    }

    foreach (var r in second.Keys)      // O(m)
    {
        if (!inner.Contains(r))
            right.Add(r);
    }

    return new JoinResult<TKey>
    {
        Left = left,
        Inner = inner,
        Right = right
    };
}

public class JoinResult<T>
{
    public IEnumerable<T> Left { get; set; }
    public IEnumerable<T> Inner { get; set; }
    public IEnumerable<T> Right { get; set; }
}
like image 114
Matt Cole Avatar answered Nov 19 '25 07:11

Matt Cole



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!