Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find only one repeating number in million(s) number

This puzzle was asked to me recently in adobe interview:- There is a array containing millions of unordered positive number where all elements are distinct except for one number which occurs exactly twice. The motive is to find that twice occuring number in the most optimum way.

P.S. Absolutely no order/pattern is applicable to the array.

The interviewer rejected the possibility of any sort as that would take lot of time, he wanted to question to be taken as a puzzle and then propose a smarter solution.

like image 531
Vaibhav Arora Avatar asked Feb 09 '23 20:02

Vaibhav Arora


2 Answers

First approach would be to just sort the array then go through the sorted data until you find two identical consecutive numbers. This could easily be done in O(n log n) time and O(1) space.

If the interviewer then asked if there was a better way, you would then discuss any limitations that may be on the data (order/pattern doesn't necessarily imply no limitations on the data at all). You should also question what they actually mean by optimum - the term itself means little without a quantity being measured.

Some people optimise for time, some for space, some (like myself) even optimise for code readability :-)

In terms of discussing limitations, an example would be if the range of the numbers was limited to several million. Then it would be a simple matter to create an array of counts and process all the data in O(n) time with something like:

dim array[several million] as zero
for each number:
    array[number]++
    if array[number] == 2:
        print number
        stop

Even without such a limitation, a 32-bit number range could use an array of four billion or so bits (about 500M), and that's your classic example of trading space for time.

Keep in mind that interview questions aren't trying to figure out if you have a solution to a given problem, they're so the interviewer can see your thought processes. More often than not, your greatest asset is not an encyclopedic knowledge of algorithms, it's you ability to think intelligently about problems and how to solve them.

like image 134
paxdiablo Avatar answered Feb 22 '23 17:02

paxdiablo


A single sequential pass through the array with hashing the values into a set will tell me the duplicate. This is O(n), but uses memory and data structures for the HashSet. The worst case for Hashing is duplicates in the first and last place.

Sorting even up to 25M integers is fast, ~2 sec, and - although O(n log n) - has a relatively constant time, and is much faster than the worst case for hashing. OTOH, hashing may beat sorting, as well as the next method:-

Fastest is using a BitMap for registering numbers (~ 1 sec), although this may require a considerable amount of memory ((0x7FFF_FFFF+1)/8 - i.e., the number of non-negative integers divided by bits per bytes), but here allocation is straightforward. Again, the worst case is duplicates in the first and last place.

Here is the code I've used for comparison. I should be taken with care, like most naive benchmarks in Java. But it shows that code readability isn't an issue with any of the approaches.

public class Duplicate {
    public static void main(String[] args) throws Exception {
        Random r = new Random( 100L );
        int[] a = new int[25000000];
        Set<Integer> set  = new HashSet<>(a.length/2);
        boolean dupl = true;
        for( int i = 0; i < a.length; ){
            int x = Math.abs( r.nextInt() );
            if( set.add( x ) ){
                a[i++] = x;
            }
        }
        a[a.length-1] = a[0]; // Worst case for HashSet and BitSet
        set = null;

        System.out.println( "hash " + new Date() );
        set  = new HashSet<>();
        for( int i = 0; i < a.length; ++i ){
            if( ! set.add( a[i] ) ){
                System.out.println( a[i] );
                break;
            }
        }
        set = null;

        System.out.println( "bitmap " + new Date() );
        BitSet bs = new BitSet( 0x7FFF_FFFF ); 
        for( int i = 0; i < a.length; ++i ){
            if( bs.get( a[i]-1 ) ){
                System.out.println( a[i] );
                break;
            }
            bs.set( a[i]-1 );
        }

        System.out.println( "sort "  + new Date());
        Arrays.sort( a );
        for( int i = 1; i < a.length; ++ i ){
            if( a[i] == a[i-1] ){
                System.out.println( a[i] );
                break;
            }
        }
        System.out.println( "done " + new Date() );
    }
}

Later Note that Java 8 has Arrays.sortParallel. Given you have the HW, this will further reduce sort time. - Also note that the bit set method is based on the spec "positive numbers". It would complicate matters if negative numbers were to be included, but I suspect that the interviewers wanted to learn about the candidate's "fluency" w.r.t. Java's java.util resources.

like image 23
laune Avatar answered Feb 22 '23 17:02

laune