Lambda expression to find difference

Question

With the following data

string[] data = { "a", "a", "b" };

I'd very much like to find duplicates and get this result:

I tried the following code

var a = data.Distinct().ToList();
var b = a.Except(a).ToList();

obviously this didn't work, I can see what is happening above but I'm not sure how to fix it.

Jens · Accepted Answer

When runtime is no problem, you could use

var duplicates = data.Where(s => data.Count(t => t == s) > 1).Distinct().ToList();

Good old O(n^n) =)

Edit: Now for a better solution. =) If you define a new extension method like

static class Extensions
{        

    public static IEnumerable<T> Duplicates<T>(this IEnumerable<T> input)
    {
        HashSet<T> hash = new HashSet<T>();
        foreach (T item in input)
        {
            if (!hash.Contains(item))
            {
                hash.Add(item);
            }
            else
            {
                yield return item;
            }
        }
    }
}

you can use

var duplicates = data.Duplicates().Distinct().ToArray();

John Leidegren · Answer

Use the group by stuff, the performance of these methods are reasonably good. Only concern is big memory overhead if you are working with large data sets.

from g in (from x in data group x by x)
where g.Count() > 1 
select g.Key;

--OR if you prefer extension methods

data.GroupBy(x => x)
    .Where(x => x.Count() > 1)
    .Select(x => x.Key)

Where Count() == 1 that's your distinct items and where Count() > 1 that's one or more duplicate items.

Since LINQ is kind of lazy, if you don't want to reevaluate your computation you can do this:

var g = (from x in data group x by x).ToList(); // grouping result
// duplicates
from x in g
where x.Count() > 1 
select x.Key;
// distinct
from x in g
where x.Count() == 1 
select x.Key;

When creating the grouping a set of sets will be created. Assuming that it's a set with O(1) insertion the running time of the group by approach is O(n). The incurred cost for each operation is somewhat high, but it should equate to near linear performance.

Lambda expression to find difference

Tags:

c#

lambda

deferred-query

user375049

2 Answers

Jens

John Leidegren

Recent Activity

Donate For Us

Lambda expression to find difference

Tags:

c#

lambda

deferred-query

user375049

2 Answers

Jens

John Leidegren

Related questions

Recent Activity

Donate For Us