Find duplicates in a array/list of integers

Question

Given an array/list of integers, output the duplicates.

Also, what I am really looking for: what solutions have best time performance? Best space performance? Is it possible to have both best time and best space performance? Just curious. Thank you!

For example: given the list [4,1,7,9,4,5,2,7,6,5,3,6,7], the answer would be [4,7,6,5] (the order of the output does not matter).

I wrote up my solution in python.

Here's one solution I wrote using a hash and binary search.

def binarySearch(array, number):
    start = 0
    end = len(array)
    mid = (end + start) // 2
    while (end > start):
        mid = start + (end - start) // 2
        if array[mid] == number:
            return (mid, True)
        elif number > array[mid]:
            if start == mid:
                return (mid + 1, False)
                start = mid
            else:
                end = mid

    return (mid, False)

def findDuplicatesWithHash(array):
    duplicatesHash = {}
    duplicates = []
    for number in array:
        try:
            index,found = binarySearch(duplicates, number)
            if duplicatesHash[number] == 0 and not found: 
                duplicates.insert(index, number)
        except KeyError as error:
            duplicatesHash[number] = 0

    duplicatesSorted = sorted(duplicates, key=lambda tup: tup)
    return duplicatesSorted

p0lAris · Accepted Answer

There are multiple solutions to finding duplicates. Given this question is completely generic, one can assume that given a list of n values, the number of duplicates lie in the range [0, n/2].

What are the possible methods you can think of?

Hash Table approach:

Store values while traversing the list if value already doesn't exist in the hash table. If the value, exists, you have a duplicate.
```
Algorithm FindDuplicates(list)
hash_table <- HashTable()
duplicates <- List()
for value in list:
    if value in hash_table:
        duplicates.add(value)
    else:
        hash_table.add(value, true)
```
- Time: O(n) to traverse through all values
- Space: O(n) to save all possible values in the hash table.
Sort Array

Sort the array and traverse neighbour values.
```
Algorithm FindDuplicates(list)
list.sort()
duplicates <- Set()
for i <- [1, len(list)-1]:
    if list[i] = list[i-1]:
        duplicates.add(list[i])
```
- Time: O(n.logn) + O(n) = O(n.logn) to sort and traverse all values
- Space: O(1) as no extra space created to produce duplicates

Check for every value

For every value check if the value exists in the array.

Algorithm Search(i, list):
    for j <- [0, len(list)-1] - [i]:
        if list[j] = list[i]:
            return true
    return false

Algorithm FindDuplicates(list)
duplicates <- Set()
for i <- [1, len(list)-1]:
    if Search(i, list):
        duplicates.add(list[i])

Time: O(n^2) number of comparisons are n*n(-1) Space: O(1) as no extra space created to produce duplicates

Note: space for the duplicates array cannot be included in the space complexity equations as that is the result we want.

Can you think of some more?

GAVD · Answer

One way to get the duplicate:

l = [4,1,7,9,4,5,2,7,6,5,3,6]
import collections

print([item for item, count in collections.Counter(l).items() if count > 1])

Find duplicates in a array/list of integers

Tags:

python

arrays

algorithm

list

sorting

OhaiMac

2 Answers

p0lAris

GAVD

Recent Activity

Donate For Us

Find duplicates in a array/list of integers

Tags:

python

arrays

algorithm

list

sorting

OhaiMac

2 Answers

p0lAris

GAVD

Related questions

Recent Activity

Donate For Us