Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java Array, Finding Duplicates

Tags:

java

arrays

I have an array, and am looking for duplicates.

duplicates = false;
for(j = 0; j < zipcodeList.length; j++){
    for(k = 0; k < zipcodeList.length; k++){
        if (zipcodeList[k] == zipcodeList[j]){
            duplicates = true;
        }
    }
}

However, this code doesnt work when there are no duplicates. Whys that?

like image 572
Snowman Avatar asked Oct 17 '10 01:10

Snowman


People also ask

How do you find duplicates in an array?

function checkIfArrayIsUnique(myArray) { for (var i = 0; i < myArray. length; i++) { for (var j = 0; j < myArray. length; j++) { if (i != j) { if (myArray[i] == myArray[j]) { return true; // means there are duplicate values } } } } return false; // means there are no duplicate values. }


3 Answers

On the nose answer..

duplicates=false; for (j=0;j<zipcodeList.length;j++)   for (k=j+1;k<zipcodeList.length;k++)     if (k!=j && zipcodeList[k] == zipcodeList[j])       duplicates=true; 

Edited to switch .equals() back to == since I read somewhere you're using int, which wasn't clear in the initial question. Also to set k=j+1, to halve execution time, but it's still O(n2).

A faster (in the limit) way

Here's a hash based approach. You gotta pay for the autoboxing, but it's O(n) instead of O(n2). An enterprising soul would go find a primitive int-based hash set (Apache or Google Collections has such a thing, methinks.)

boolean duplicates(final int[] zipcodelist) {   Set<Integer> lump = new HashSet<Integer>();   for (int i : zipcodelist)   {     if (lump.contains(i)) return true;     lump.add(i);   }   return false; } 

Bow to HuyLe

See HuyLe's answer for a more or less O(n) solution, which I think needs a couple of add'l steps:

static boolean duplicates(final int[] zipcodelist) {    final int MAXZIP = 99999;    boolean[] bitmap = new boolean[MAXZIP+1];    java.util.Arrays.fill(bitmap, false);    for (int item : zipcodeList)      if (!bitmap[item]) bitmap[item] = true;      else return true;    }    return false; } 

Or Just to be Compact

static boolean duplicates(final int[] zipcodelist) {    final int MAXZIP = 99999;    boolean[] bitmap = new boolean[MAXZIP+1];  // Java guarantees init to false    for (int item : zipcodeList)      if (!(bitmap[item] ^= true)) return true;    return false; } 

Does it Matter?

Well, so I ran a little benchmark, which is iffy all over the place, but here's the code:

import java.util.BitSet;  class Yuk {   static boolean duplicatesZero(final int[] zipcodelist)   {     boolean duplicates=false;     for (int j=0;j<zipcodelist.length;j++)       for (int k=j+1;k<zipcodelist.length;k++)         if (k!=j && zipcodelist[k] == zipcodelist[j])           duplicates=true;      return duplicates;   }     static boolean duplicatesOne(final int[] zipcodelist)   {     final int MAXZIP = 99999;     boolean[] bitmap = new boolean[MAXZIP + 1];     java.util.Arrays.fill(bitmap, false);     for (int item : zipcodelist) {       if (!(bitmap[item] ^= true))         return true;     }     return false;   }    static boolean duplicatesTwo(final int[] zipcodelist)   {     final int MAXZIP = 99999;      BitSet b = new BitSet(MAXZIP + 1);     b.set(0, MAXZIP, false);     for (int item : zipcodelist) {       if (!b.get(item)) {         b.set(item, true);       } else         return true;     }     return false;   }    enum ApproachT { NSQUARED, HASHSET, BITSET};    /**    * @param args    */   public static void main(String[] args)   {     ApproachT approach = ApproachT.BITSET;      final int REPS = 100;     final int MAXZIP = 99999;      int[] sizes = new int[] { 10, 1000, 10000, 100000, 1000000 };     long[][] times = new long[sizes.length][REPS];      boolean tossme = false;      for (int sizei = 0; sizei < sizes.length; sizei++) {       System.err.println("Trial for zipcodelist size= "+sizes[sizei]);       for (int rep = 0; rep < REPS; rep++) {         int[] zipcodelist = new int[sizes[sizei]];         for (int i = 0; i < zipcodelist.length; i++) {           zipcodelist[i] = (int) (Math.random() * (MAXZIP + 1));         }         long begin = System.currentTimeMillis();         switch (approach) {         case NSQUARED :           tossme ^= (duplicatesZero(zipcodelist));           break;         case HASHSET :           tossme ^= (duplicatesOne(zipcodelist));           break;         case BITSET :           tossme ^= (duplicatesTwo(zipcodelist));           break;          }         long end = System.currentTimeMillis();         times[sizei][rep] = end - begin;         }       long avg = 0;       for (int rep = 0; rep < REPS; rep++) {         avg += times[sizei][rep];       }       System.err.println("Size=" + sizes[sizei] + ", avg time = "             + avg / (double)REPS + "ms");     }   }  } 

With NSQUARED:

Trial for size= 10 Size=10, avg time = 0.0ms Trial for size= 1000 Size=1000, avg time = 0.0ms Trial for size= 10000 Size=10000, avg time = 100.0ms Trial for size= 100000 Size=100000, avg time = 9923.3ms 

With HashSet

Trial for zipcodelist size= 10 Size=10, avg time = 0.16ms Trial for zipcodelist size= 1000 Size=1000, avg time = 0.15ms Trial for zipcodelist size= 10000 Size=10000, avg time = 0.0ms Trial for zipcodelist size= 100000 Size=100000, avg time = 0.16ms Trial for zipcodelist size= 1000000 Size=1000000, avg time = 0.0ms 

With BitSet

Trial for zipcodelist size= 10 Size=10, avg time = 0.0ms Trial for zipcodelist size= 1000 Size=1000, avg time = 0.0ms Trial for zipcodelist size= 10000 Size=10000, avg time = 0.0ms Trial for zipcodelist size= 100000 Size=100000, avg time = 0.0ms Trial for zipcodelist size= 1000000 Size=1000000, avg time = 0.0ms 

BITSET Wins!

But only by a hair... .15ms is within the error for currentTimeMillis(), and there are some gaping holes in my benchmark. Note that for any list longer than 100000, you can simply return true because there will be a duplicate. In fact, if the list is anything like random, you can return true WHP for a much shorter list. What's the moral? In the limit, the most efficient implementation is:

 return true; 

And you won't be wrong very often.

like image 111
andersoj Avatar answered Sep 29 '22 12:09

andersoj


Let's see how your algorithm works:

an array of unique values:

[1, 2, 3]

check 1 == 1. yes, there is duplicate, assigning duplicate to true.
check 1 == 2. no, doing nothing.
check 1 == 3. no, doing nothing.
check 2 == 1. no, doing nothing.
check 2 == 2. yes, there is duplicate, assigning duplicate to true.
check 2 == 3. no, doing nothing.
check 3 == 1. no, doing nothing.
check 3 == 2. no, doing nothing.
check 3 == 3. yes, there is duplicate, assigning duplicate to true.

a better algorithm:

for (j=0;j<zipcodeList.length;j++) {
    for (k=j+1;k<zipcodeList.length;k++) {
        if (zipcodeList[k]==zipcodeList[j]){ // or use .equals()
            return true;
        }
    }
}
return false;
like image 34
Lie Ryan Avatar answered Sep 29 '22 13:09

Lie Ryan


You can use bitmap for better performance with large array.

    java.util.Arrays.fill(bitmap, false);

    for (int item : zipcodeList)
        if (!bitmap[item]) bitmap[item] = true;
        else break;

UPDATE: This is a very negligent answer of mine back in the day, keeping it here just for reference. You should refer to andersoj's excellent answer.

like image 21
Huy Le Avatar answered Sep 29 '22 13:09

Huy Le