Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Algorithm to find the duplicate numbers in an array ---Fastest Way

Tags:

c

algorithm

I need the fastest and simple algorithm which finds the duplicate numbers in an array, also should be able to know the number of duplicates.

Eg: if the array is {2,3,4,5,2,4,6,2,4,7,3,8,2}

I should be able to know that there are four 2's, two 3's and three 4's.

like image 891
Raviteja Avatar asked Dec 01 '22 11:12

Raviteja


2 Answers

Make a hash table where the key is array item and value is counter how many times the corresponding array item has occurred in array. This is efficient way to do it, but probably not the fastest way.

Something like this (in pseudo code). You will find plenty of hash map implementations for C by googling.

 hash_map = create_new_hash_map()
 for item in array {
   if hash_map.contains_key(item){
      counter = hash_map.get(item)
   } else {
      counter = 0
   }
   counter = counter + 1
   hash_map.put(item, counter)
 }
like image 186
Juha Syrjälä Avatar answered Dec 09 '22 20:12

Juha Syrjälä


This can be solved elegantly using Linq:

public static void Main(string[] args)
{
    List<int> list = new List<int> { 2, 3, 4, 5, 2, 4, 6, 2, 4, 7, 3, 8, 2 };

    var grouping = list
        .GroupBy(x => x)
        .Select(x => new { Item = x.Key, Count = x.Count()});

    foreach (var item in grouping)
        Console.WriteLine("Item {0} has count {1}", item.Item, item.Count);
}

Internally it probably uses hashing to partition the list, but the code hides the internal details - here we are only telling it what to calculate. The compiler / runtime is free to choose how to calculate it, and optimize as it sees fit. Thanks to Linq this same code will run efficiently whether run an a list in memory, or if the list is in a database. In real code you should use this, but I guess you want to know how internally it works.

A more imperative approach that demonstrates the actual algorithm is as follows:

    List<int> list = new List<int> { 2, 3, 4, 5, 2, 4, 6, 2, 4, 7, 3, 8, 2 };

    Dictionary<int, int> counts = new Dictionary<int, int>();
    foreach (int item in list)
    {
        if (!counts.ContainsKey(item))
        {
            counts[item] = 1;
        }
        else
        {
            counts[item]++;
        }
    }

    foreach (KeyValuePair<int, int> item in counts)
        Console.WriteLine("Item {0} has count {1}", item.Key, item.Value);

Here you can see that we iterate over the list only once, keeping a count for each item we see on the way. This would be a bad idea if the items were in a database though, so for real code, prefer to use the Linq method.

like image 29
Mark Byers Avatar answered Dec 09 '22 19:12

Mark Byers