Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is using sorting (O(n log n) complexity) to find the majority element faster than using a HashMap (O(n) complexity)?

Majority element question:

Given an array of size n, find the majority element. The majority element is the element that appears more than ⌊ n/2 ⌋ times. You may assume that the array is non-empty and the majority element always exist in the array.

// Solution1 - Sorting ----------------------------------------------------------------
    class Solution {
        public int majorityElement(int[] nums) {
            Arrays.sort(nums);
            return nums[nums.length/2];
        }
    }

// Solution2 - HashMap ---------------------------------------------------------------
class Solution {
    public int majorityElement(int[] nums) {
        // int[] arr1 = new int[nums.length];
        HashMap<Integer, Integer> map = new HashMap<>(100);  
        Integer k = new Integer(-1);
        try{
            for(int i : nums){
                if(map.containsKey(i)){
                    map.put(i, map.get(i)+1);
                }
                else{
                    map.put(i, 1);
                }
            }
            for(Map.Entry<Integer, Integer> entry : map.entrySet()){
                if(entry.getValue()>(nums.length/2)){
                    k = entry.getKey();
                    break;
                }
            }
        }catch(Exception e){
            throw new IllegalArgumentException("Error");
        }
        return k;    
    }
}

The Arrays.sort() function is implemented in Java using QuickSort and has O(n log n) time complexity.

On the other hand, using HashMap to find the majority element has only O(n) time complexity.

Hence, solution 1 (sorting) should take longer than solution 2 (HashMap), but when I was doing the question on LeetCode, the average time taken by solution 2 is much more (almost 8 times more) than solution 1.

Why is that the case? I'm really confused.....

Is the size of the test case the reason? Will solution 2 become more efficient when the number of elements in the test case increases dramatically?

like image 920
Y.Wang Avatar asked Jun 08 '20 17:06

Y.Wang


People also ask

What is the time complexity of the majority element program if the data structure used is the balanced BST?

Time complexity = O(nlogn). Space complexity = O (logn) for recursion call stack.

How do you find the majority element in divide and conquer?

Divide and conquer (linearithmic time) Rather than counting occurrences for all the values, let's just count occurrences for the majority elements in each half of the list. And as a bonus: if each half has the same majority element, then that's our majority element for the whole list.

How do you solve the majority element problem?

The basic solution is to have two loops and keep track of the maximum count for all different elements. If the maximum count becomes greater than n/2 then break the loops and return the element having the maximum count. If the maximum count doesn't become more than n/2 then the majority element doesn't exist.

How do you find a dominant number in an array?

A dominant number in an array is an integer that occurs more than N/3 times in the array, where N is the array's length.


2 Answers

Big O isn't a measure of actual performance. It's only going to give you an idea of how your performance will evolve in comparison to n.

Practically, an algorithms in O(n.logn) will eventually be slower than O(n) for some n. But that n might be 1, 10, 10^6 or even 10^600 - at which point it's probably irrelevant because you'll never run into such a data set - or you won't have enough hardware for it.

Software engineers have to consider both actual performance and performance at the practical limit. For example hash map lookup is in theory faster than an unsorted array lookup... but then most arrays are small (10-100 elements) negating any O(n) advantage due the extra code complexity.

You could certainly optimize your code a bit, but in this case you're unlikely to change the outcome for small n unless you introduce another factor (e.g. artificially slow down the time per cycle with a constant).

(I wanted to find a good metaphor to illustrate, but it's harder than expected...)

like image 102
ptyx Avatar answered Oct 09 '22 09:10

ptyx


It depends on the test cases, some test cases will be faster in HashMap while others not.

Why is that? The Solution 1 grantee in worst case O(N log2 N), but the HashMap O(N . (M + R)) where M is the cost of collisions and R the cost of resizing the array.

HashMap uses an array named table of the nodes internally, and it resizes different times when the input increase or shrink. And you assigned it with an initial capacity of 100.

So let see what happens? Java uses Separate chaining for resolving the collisions and some test cases may have lots of collisions which lead to consuming lots of time when a query or update the hashmap.

Conclusion the implementation of hashmap is affected by two factors: 1. Resize the table array based on the input size 2. How many collision appears in the input

like image 2
heaprc Avatar answered Oct 09 '22 07:10

heaprc