Finding duplicate numbers in an array of numbers

Tags:

I was asked about this during an interview, given a list of numbers, return only the duplicates present in the input as sorted output.

Example:

Input = [6, 7, 5, 6, 1, 0, 1, 0, 5, 3, 2]
Output = [0, 1, 5, 6] - sorted unique numbers which are duplicates in input

I have come up with the below solution:

Approach1:

public static List<Integer> process(List<Integer> input) {
    List<Integer> result = new ArrayList<>();
    Map<Integer, Integer> map = new HashMap<>();
    for (int val : input) {
        map.put(val, map.getOrDefault(val, 0) + 1);
    }

    map.forEach((key, val) -> {
        if (val > 1) {
            result.add(key);
        }
    });
    result.sort(null);
    return result;
}

Updated Approach2:

public static List<Integer> process1(List<Integer> input) {
    Set<Integer> dups = new HashSet<>();
    Set<Integer> set = new HashSet<>();
    for (int val : input) {
        if (set.contains(val)) {
            dups.add(val);
        } else {
            set.add(val);
        }
    }
    List<Integer> result = new ArrayList<>(dups);
    result.sort(null);
    return result;
}

Old Approach2

public static List<Integer> process1(List<Integer> input) {
    List<Integer> result = new ArrayList<>();
    Set<Integer> set = new HashSet<>();
    for (int val : input) {
        if (set.contains(val)) {
            result.add(val);
        } else {
            set.add(val);
        }
    }
    result.sort(null);
    return result;
}

Approach1 time complexity is (n)Log(n) as sorting in java is nlogn, space complexity is n

Approach2 time complexity is (n)Log(n) again as sorting in java is nlogn, space complexity is little less as compared to Approach 1, as I hold elements only once in my set.

Please correct me if I am wrong in finding out time and space complexities.

Now the question is whether this logic works if the input contains million of numbers or not? Whether HashMap works if input is million of numbers?

As per my understanding generally, the time complexity for map or set is less, also HashSet internal implementation uses HashMap. How to answer this question.

984

asked Oct 11 '19 20:10

learner

2 Answers

Approach2 fails if a number is present three or more times as it will add the number to the output multiple times. You're right about less space complexity, but your reasoning is a little strange – it's because HashSet will internally use the same dummy object in its underlying HashMap to indicate a value is present, while for Approach1, you are allocating an Integer every time.

A HashMap internally holds a list of buckets, so generally, if you're able to allocate a list containing a million numbers, you should also be able to allocate a HashMap holding (at most) as many numbers.

It would be a good idea to set the initial capacity of the HashMap when constructing it to the size of the list. This will make your code faster for large lists since it avoids rehashing.

Note that there may be a faster approach: Sorting the initial list. In the sorted list, finding duplicates is trivial since they are adjacent, so you don't need a HashMap. However, you need to copy the initial list for that if you're not allowed to modify it, so space requirements will be the same. Theoretical complexity stays the same (sorting is O(nlogn), finding the duplicates will be O(n)), actual time for sorting will be more since we sort the large list, but you will avoid all allocations in the HashMap. This may or may not make up for the additional time spent sorting the large list.

answered Oct 18 '22 07:10

flyx

I was curious how different implementations of this algorithm would behave under JMH performance tests and the fastest implementation I came up with is:

Set<Integer> all = new HashSet<>(input.size());
Set<Integer> output = new TreeSet<>();

for(Integer val : input) {
   if (!all.add(val)) {
      output.add(val);
   }
}

return new ArrayList<>(output);

Below are JMH results for the above implementation (algo2) and your Approach 1 implementation (algo1):

Benchmark                   (N)  Mode  Cnt    Score    Error  Units
PerformanceTests.algo1  1000000  avgt    3  323.265 ± 33.919  ms/op
PerformanceTests.algo2  1000000  avgt    3  285.505 ± 29.744  ms/op

Update, @josejuan you were right, below is the algorithm 6x faster than the previous ones:

int[] input = new int[INPUT.size()];
for (int i = 0; i < input.length; i++) {
    input[i] = INPUT.get(i);
}
Arrays.sort(input);

List<Integer> output = new ArrayList<>(input.length);
int prev = input[0];
boolean added = false;
for (int i = 1; i < input.length; i++) {
    if (prev == input[i]) {
        if (!added) {
            output.add(prev);
            added = true;
        }
    } else {
        added = false;
        prev = input[i];
    }
}
return output;

answered Oct 18 '22 08:10

Adam Siemion

Related questions
                            
                                ClipBoardManager.OnPrimaryClipChangedListener Not Working
                            
                                What are the history-related Contacts framework class in iOS 13?
                            
                                Deserialize file using serde_json at compile time
                            
                                swiftui one view with 2 or 3 actionsheets?
                            
                                React Router Switch did not work properly
                            
                                .NET Implement IEnumerator with nullable reference types
                            
                                Mocking SqlConnection, SqlCommand and SqlReader in C# using MsTest
                            
                                Flutter - background notification using FCM on IOS
                            
                                ReactJS app displays whitescreen using Kubernetes Ingress
                            
                                Flask Marshmallow serialize many to many relation with extra field
                            
                                react-select custom heading
                            
                                Add legend to geopandas

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With