Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java - What is the best way to find first duplicate character in a string

Tags:

java

algorithm

I have written below code for detecting first duplicate character in a string.

public static int detectDuplicate(String source) {
    boolean found = false;
    int index = -1;
    final long start = System.currentTimeMillis();
    final int length = source.length();
    for(int outerIndex = 0; outerIndex < length && !found; outerIndex++) {
        boolean shiftPointer = false;
        for(int innerIndex = outerIndex + 1; innerIndex < length && !shiftPointer; innerIndex++ ) {
            if ( source.charAt(outerIndex) == source.charAt(innerIndex)) {
                found = true;
                index = outerIndex;
            } else {
                shiftPointer = true;
            }
        }
    }
    System.out.println("Time taken --> " + (System.currentTimeMillis() - start) + " ms. for string of length --> " + source.length());
    return index;
}

I need help on two things:

  • What is the worst case complexity of this algorithm? - my understanding is O(n).
  • Is it the best way to do this? Can somebody provide a better solution (if any)?

Thanks, NN

like image 334
Niranjan Avatar asked Sep 06 '12 17:09

Niranjan


2 Answers

As mentioned by others, your algorithm is O(n^2). Here is an O(N) algorithm, because HashSet#add runs in constant time ( the hash function disperses the elements properly among the buckets) - Note that I originally size the hashset to the maximum size to avoid resizing/rehashing:

public static int findDuplicate(String s) {
    char[] chars = s.toCharArray();
    Set<Character> uniqueChars = new HashSet<Character> (chars.length, 1);
    for (int i = 0; i < chars.length; i++) {
        if (!uniqueChars.add(chars[i])) return i;
    }
    return -1;
}

Note: this returns the index of the first duplicate (i.e. the index of the first character that is a duplicate of a previous character). To return the index of the first appearance of that character, you would need to store the indices in a Map<Character, Integer> (Map#put is also O(1) in this case):

public static int findDuplicate(String s) {
    char[] chars = s.toCharArray();
    Map<Character, Integer> uniqueChars = new HashMap<Character, Integer> (chars.length, 1);
    for (int i = 0; i < chars.length; i++) {
        Integer previousIndex = uniqueChars.put(chars[i], i);
        if (previousIndex != null) {
            return previousIndex;
        }
    }
    return -1;
}
like image 64
assylias Avatar answered Nov 14 '22 22:11

assylias


The complexity is roughly O(M^2), where M is the minimum between the length of the string and the size of the set of possible characters K.

You can get it down to O(M) with O(K) memory by simply memorizing the position where you first encounter every unique character.

like image 24
Qnan Avatar answered Nov 14 '22 23:11

Qnan