Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Best Algorithm for Removing Duplicate Values from a List

What is best algorithm for removing duplicate values from a list ? I've tried this:

for (int i = 0; i < AuthorCounter-1; i++)
{
    for (int j = 0; j < AuthorCounter-1; j++)
    {
        if (i != j)
        {
            if (AuthorGroupNode.Nodes[i].Text == AuthorGroupNode.Nodes[j].Text)
            {
                AuthorGroupNode.Nodes[j].Remove();
                AuthorCounter--;
            }

        }
    }
}

Here, AuthorGroupNodes is a list on nodes. It did things right to some extent, but not perfect. Any one have better solution ???

like image 355
Itz.Irshad Avatar asked Jul 17 '12 04:07

Itz.Irshad


2 Answers

Your current algorithm is O(N-squared), which will perform quite poorly for a large list.

If space is not an issue, you could keep a HashSet<int> of hashes of nodes. Traverse the list once. If the hash of the node is in the HashSet, you know this is a duplicate node. Skip it. If the hash is not in the HashSet, add this node to a new list, and add the hash of the node to the HashSet.

This will perform O(N), and requires memory for the original list, for a copy of the list less any duplicates, and for the HashSet. The algorithm is non-destructive.

If you can use Linq, simply do

var distinctList = originalList.Distinct().ToList();

UPDATE

Discovered that's pretty much exactly how Jon Skeet re-implemented Distinct.

public static IEnumerable<TSource> Distinct<TSource>( 
    this IEnumerable<TSource> source) 
{ 
    return source.Distinct(EqualityComparer<TSource>.Default); 
} 

public static IEnumerable<TSource> Distinct<TSource>( 
    this IEnumerable<TSource> source, 
    IEqualityComparer<TSource> comparer) 
{ 
    if (source == null)  
    { 
        throw new ArgumentNullException("source"); 
    } 
    return DistinctImpl(source, comparer ?? EqualityComparer<TSource>.Default); 
} 

private static IEnumerable<TSource> DistinctImpl<TSource>( 
    IEnumerable<TSource> source, 
    IEqualityComparer<TSource> comparer) 
{ 
    HashSet<TSource> seenElements = new HashSet<TSource>(comparer); 
    foreach (TSource item in source) 
    { 
        if (seenElements.Add(item)) 
        { 
            yield return item; 
        } 
    } 
}

https://codeblog.jonskeet.uk/2010/12/30/reimplementing-linq-to-objects-part-14-distinct/

like image 113
Eric J. Avatar answered Oct 20 '22 01:10

Eric J.


This works like a treat:

var xs = new []
{
    2, 3, 2, 4, 3, 3, 5, 6,
};

var ys = xs
    .ToLookup(z => z, z => z)
    .Select(x => x.First());

For your code it would look something like this:

var nodes = AuthorGroupNode.Nodes
    .ToLookup(z => z.Text, z => z)
    .Select(x => x.First())
    .ToArray();

Can't be much simpler than that. :-)

like image 4
Enigmativity Avatar answered Oct 20 '22 02:10

Enigmativity