Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find duplicates in a array/list of integers

Given an array/list of integers, output the duplicates.

Also, what I am really looking for: what solutions have best time performance? Best space performance? Is it possible to have both best time and best space performance? Just curious. Thank you!

For example: given the list [4,1,7,9,4,5,2,7,6,5,3,6,7], the answer would be [4,7,6,5] (the order of the output does not matter).

I wrote up my solution in python.

Here's one solution I wrote using a hash and binary search.

def binarySearch(array, number):
    start = 0
    end = len(array)
    mid = (end + start) // 2
    while (end > start):
        mid = start + (end - start) // 2
        if array[mid] == number:
            return (mid, True)
        elif number > array[mid]:
            if start == mid:
                return (mid + 1, False)
                start = mid
            else:
                end = mid

    return (mid, False)

def findDuplicatesWithHash(array):
    duplicatesHash = {}
    duplicates = []
    for number in array:
        try:
            index,found = binarySearch(duplicates, number)
            if duplicatesHash[number] == 0 and not found: 
                duplicates.insert(index, number)
        except KeyError as error:
            duplicatesHash[number] = 0

    duplicatesSorted = sorted(duplicates, key=lambda tup: tup)
    return duplicatesSorted
like image 965
OhaiMac Avatar asked Nov 30 '25 04:11

OhaiMac


2 Answers

There are multiple solutions to finding duplicates. Given this question is completely generic, one can assume that given a list of n values, the number of duplicates lie in the range [0, n/2].

What are the possible methods you can think of?

  1. Hash Table approach:

    Store values while traversing the list if value already doesn't exist in the hash table. If the value, exists, you have a duplicate.

    Algorithm FindDuplicates(list)
    hash_table <- HashTable()
    duplicates <- List()
    for value in list:
        if value in hash_table:
            duplicates.add(value)
        else:
            hash_table.add(value, true)
    
    • Time: O(n) to traverse through all values
    • Space: O(n) to save all possible values in the hash table.
  2. Sort Array

    Sort the array and traverse neighbour values.

    Algorithm FindDuplicates(list)
    list.sort()
    duplicates <- Set()
    for i <- [1, len(list)-1]:
        if list[i] = list[i-1]:
            duplicates.add(list[i])
    
    • Time: O(n.logn) + O(n) = O(n.logn) to sort and traverse all values
    • Space: O(1) as no extra space created to produce duplicates
  3. Check for every value

    For every value check if the value exists in the array.

    Algorithm Search(i, list):
        for j <- [0, len(list)-1] - [i]:
            if list[j] = list[i]:
                return true
        return false
    
    Algorithm FindDuplicates(list)
    duplicates <- Set()
    for i <- [1, len(list)-1]:
        if Search(i, list):
            duplicates.add(list[i])
    

    Time: O(n^2) number of comparisons are n*n(-1) Space: O(1) as no extra space created to produce duplicates

Note: space for the duplicates array cannot be included in the space complexity equations as that is the result we want.

Can you think of some more?

like image 165
p0lAris Avatar answered Dec 02 '25 16:12

p0lAris


One way to get the duplicate:

l = [4,1,7,9,4,5,2,7,6,5,3,6]
import collections

print([item for item, count in collections.Counter(l).items() if count > 1])
like image 37
GAVD Avatar answered Dec 02 '25 17:12

GAVD