Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Detecting if a string has unique characters: comparing my solution to "Cracking the Coding Interview?"

I am working through the book "Cracking the Coding Interview" and I have come across questions here asking for answers, but I need help comparing my answer to the solution. My algorithm works, but I am having difficulty understand the solution in the book. Mainly because I don't understand what some of the operators are really doing.

The task is: "Implement an algorithm to determine if a string has all unique characters. What if you cannot use additional data structures?"

This is my solution:

public static boolean checkForUnique(String str){     boolean containsUnique = false;      for(char c : str.toCharArray()){         if(str.indexOf(c) == str.lastIndexOf(c)){             containsUnique = true;         } else {             containsUnique = false;         }     }      return containsUnique; } 

It works, but how efficient is this? I saw that the complexity of the index functions for String in Java are O(n*m)

Here is the solution from the book:

public static boolean isUniqueChars(String str) {     if (str.length() > 256) {         return false;     }     int checker = 0;     for (int i = 0; i < str.length(); i++) {         int val = str.charAt(i) - 'a';         if ((checker & (1 << val)) > 0) return false;         checker |= (1 << val);     }     return true; } 

A couple things I am not quite understanding with the solution. First, what does the "|=" operator do? Why is 'a' subtracted from the current character in the string for the value of "val"? I know "<<" is a bitwise left shift, but what does (checker & (1<<val)) do? I know it is bitwise and, but I am not understanding it since I am not understanding the line where checker gets a value.

I am just not familiar with these operations and unfortunately the book does not give an explanation of the solutions, probably because it assumes you already understand these operations.

like image 313
Seephor Avatar asked Oct 21 '13 00:10

Seephor


People also ask

How do I check if a string has a unique character?

A unique string consists of characters that occur only once. To check for uniqueness, compare each character with the rest of the string. If a character is repeated, then the string is not unique.

How do I find unique characters in a string in C++?

First we will initialize all values of counter array to 0 and all values of index array to n (length of string). On traversal of the string str and for every character c, increase count[x], if count[x] = 1, index[x] = i. If count[x] = 2, index[x] = n. Sort indexes and print characters.


2 Answers

There are two separate questions here: what's the efficiency of your solution, and what is the reference solution doing? Let's treat each independently.

First, your solution:

public static boolean checkForUnique(String str){     boolean containsUnique = false;      for(char c : str.toCharArray()){         if(str.indexOf(c) == str.lastIndexOf(c)){             containsUnique = true;         } else {             containsUnique = false;         }     }      return containsUnique; } 

Your solution essentially consists of a loop over all characters in the string (let's say there are n of them), checking on each iteration whether the first and last index of the characters are the same. The indexOf and lastIndexOf methods each take time O(n), because they have to scan across all the characters of the string to determine if any of them match the one you're looking for. Therefore, since your loop runs O(n) times and does O(n) work per iteration, its runtime is O(n2).

However, there's something iffy about your code. Try running it on the string aab. Does it work correctly on this input? As a hint, as soon as you determine that there are two or more duplicated characters, you're guaranteed that there are duplicates and you can return that not all characters are unique.

Now, let's look at the reference:

public static boolean isUniqueChars(String str) {     if (str.length() > 256) { // NOTE: Are you sure this isn't 26?         return false;     }     int checker = 0;     for (int i = 0; i < str.length(); i++) {         int val = str.charAt(i) - 'a';         if ((checker & (1 << val)) > 0) return false;         checker |= (1 << val);     }     return true; } 

This solution is cute. The basic idea is the following: imagine that you have an array of 26 booleans, each one tracking whether a particular character has appeared in the string already. You start with all of them false. You then iterate across the characters of the string, and each time you see a character you look into the array slot for that character. If it's false, this is the first time you've seen the character and you can set the slot to true. If it's true, you've already seen this character and you can immediately report that there's a duplicate.

Notice that this method doesn't allocate an array of booleans. Instead, it opts for a clever trick. Since there are only 26 different characters possible and there are 32 bits in an int, the solution creates an int variable where each bit of the variable corresponds to one of the characters in the string. Instead of reading and writing an array, the solution reads and writes the bits of the number.

For example, look at this line:

if ((checker & (1 << val)) > 0) return false; 

What does checker & (1 << val) do? Well, 1 << val creates an int value that has all bits zero except for the valth bit. It then uses bitwise AND to AND this value with checker. If the bit at position val in checker is already set, then this evaluates to a nonzero value (meaning we've already seen the number) and we can return false. Otherwise, it evaluates to 0, and we haven't seen the number.

The next line is this:

checker |= (1 << val); 

This uses the "bitwise OR with assignment" operator, which is equivalent to

checker = checker | (1 << val); 

This ORs checker with a value that has a 1 bit set only at position val, which turns the bit on. It's equivalent to setting the valth bit of the number to 1.

This approach is much faster than yours. First, since the function starts off by checking if the string has length greater than 26 (I'm assuming the 256 is a typo), the function never has to test any string of length 27 or greater. Therefore, the inner loop runs at most 26 times. Each iteration does O(1) work in bitwise operations, so the overall work done is O(1) (O(1) iterations times O(1) work per iteration), which is significantly faster than your implementation.

If you haven't seen bitwise operations used this way, I'd recommend searching for "bitwise operators" on Google to learn more.

Hope this helps!

like image 195
templatetypedef Avatar answered Sep 24 '22 23:09

templatetypedef


The book solution is one I don't like and I believe is dysfunctional..... templatetypedef has posted a comprehensive answer which indicates that the solution is a good one. I disagree, since the book's answer assumes that the string only has lower-case characters, (ascii) and does no validation to ensure that.

public static boolean isUniqueChars(String str) {     // short circuit - supposed to imply that     // there are no more than 256 different characters.     // this is broken, because in Java, char's are Unicode,     // and 2-byte values so there are 32768 values     // (or so - technically not all 32768 are valid chars)     if (str.length() > 256) {         return false;     }     // checker is used as a bitmap to indicate which characters     // have been seen already     int checker = 0;     for (int i = 0; i < str.length(); i++) {         // set val to be the difference between the char at i and 'a'         // unicode 'a' is 97         // if you have an upper-case letter e.g. 'A' you will get a         // negative 'val' which is illegal         int val = str.charAt(i) - 'a';         // if this lowercase letter has been seen before, then         // the corresponding bit in checker will have been set and         // we can exit immediately.         if ((checker & (1 << val)) > 0) return false;         // set the bit to indicate we have now seen the letter.         checker |= (1 << val);     }     // none of the characters has been seen more than once.     return true; } 

The bottom line, given templatedef's answer too, is that there's not actually enough information to determine whether the book's answer is right.

I distrust it though.

templatedef's answer on the complexity is one I agree with though.... ;-)

EDIT: As an exercise, I converted the book's answer to one that will work (albeit slower than the book's answer - BigInteger is slow).... This version does the same logic as the book's, but does not have the same validation and assumption problems (but it is slower). It is useful to show the logic too.

public static boolean isUniqueChars(String str) {     if (str.length() > 32768) {         return false;     }     BigInteger checker = new BigInteger(0);     for (int i = 0; i < str.length(); i++) {         int val = str.charAt(i);         if (checker.testBit(val)) return false;         checker = checker.setBit(val);     }     // none of the characters has been seen more than once.     return true; } 
like image 36
rolfl Avatar answered Sep 21 '22 23:09

rolfl