the following code will check to see if you have any duplicate characters in the string, but i don't understand the if clause:
public static boolean isUniqueChars(String str) {
int checker = 0;
for (int i = 0; i < str.length(); ++i) {
int val = str.charAt(i) - 'a';
if ((checker & (1 << val)) > 0)
return false;
checker |= (1 << val);
}
return true;
}
I tried to look up some references, I am new to bit shifting, all i understand is that << shifts the binary number left or right. Can you explain to me how checker |= (1 << val) works ? and that 'if' statement as well.
I was also going through this book Cracking the Code Interview and ended up googling for a clear explanations. Finally I understood the concept.
Here is the approach.
Note :
We will assume, in the below code, that the string is only lower case ‘a’ through ‘z’. This will allow us to use just a single int.
Java integer is of size 32
Number of lower case alphabets is 26
So we can clearly set 0/1 (true or false) value inside one integer in decimal notation.
It is similar to bool visited[32] . bool uses 1 byte. Hence you need 32 bytes for storing bool visited[32].
Bit masking is a space optimization to this.
Lets start :
int val = str.charAt(i) - 'a';
For 'b' it is 1. ie 98-97 .
(1 << val) // 1<<1 => 10(binary)
Now let us see how bitwise & works
0 & 0 -> 0
0 & 1 -> 0
1 & 0 -> 0
1 & 1 -> 1
So by the below code :
(checker & (1 << val))
We check if the checker[val] == 0 . Suppose we had already encountered 'b'.
check = 0000 0000 0000 0000 0000 1000 1000 0010 &
'b' = 0000 0000 0000 0000 0000 0000 0000 0010
----------------------------------------------
result= 0000 0000 0000 0000 0000 0000 0000 0010
ie decimal value = 2 which is >0
So you finally we understood this part.
if ((checker & (1 << val)) > 0)
return false;
( This part is called as bit masking. )
OR's Truth table
0 | 0 -> 0
0 | 1 -> 1
1 | 0 -> 1
1 | 1 -> 1
So
check = 0000 0000 0000 0000 0000 1000 1000 0000 |
'b' = 0000 0000 0000 0000 0000 0000 0000 0010
----------------------------------------------
result= 0000 0000 0000 0000 0000 1000 1000 0010
So that simplifies this part:
checker |= (1 << val); // checker = checker | (1 << val);
I hope this helped someone !
Seems like I am late to the party, but let me take a stab at the explanation.
First of all the AND i.e & operation:
0 & 0 = 0
1 & 0 = 0
0 & 1 = 0
1 & 1 = 1
So basically, if you are given a bit, and you want to find out if its 1 or 0, you just & it with a 1. If the result is 1 then you had a 1, else you had 0. We will use this property of the & below.
The OR i.e | operation
0 | 0 = 0
1 | 0 = 1
0 | 1 = 1
1 | 1 = 1
So basically, if you are given a bit, and you want to do something to it so that the output is always 1, then you do an | 1 with it.
Now, In Java the type int
is 4 bytes i.e. 32 bits. Thus we can use an int
itself as a data-structure to store 32 states or booleans in simpler terms, since a bit can either be 0 or 1 i.e false or true. Since we assume that our string is composed of only lower case characters, we have enough space inside our int to store a boolean for each of the 26 chars!
So first we initialize our data-structure that we call checker
to 0 which is nothing but 32 zeros: 0000000000000000000000
.
So far so good?
Now we go through our string, for each character, first we get an integer representation of the character.
int val = str.charAt(i) - 'a';
We subtract a
from it because we want our integer to be 0 based. So if vals:
a = 0 i.e. `0000000000000000000000`
b = 1 i.e. `0000000000000000000001`
c = 2 i.e. `0000000000000000000010`
d = 4 i.e. `0000000000000000000100`
Now as seen above, a is 32 zeros, but rest of the characters have a single 1 and 31 zeros. So when we use these characters, we left shift
each of them by 1, i.e. (1 << val), so each of them have a single 1 bit, and 31 zero bits:
a = 1 i.e. `0000000000000000000001`
b = 2 i.e. `0000000000000000000010`
c = 4 i.e. `0000000000000000000100`
d = 8 i.e. `0000000000000000001000`
We are done with the setup. Now we do 2 things:
First assume all characters are different. For every char we encounter, we want our datastructure i.e. checker to have 1 for that char. So we use our OR property descrived above to generate a 1 in our datastructure, and hence we do:
checker = checker | (1 << val);
Thus checker stores 1 for every character we encounter.
Now we come to the part where characters can repeate. So before we do step 1, we want to make sure that the checker already does not have a 1 at the position corresponding to the current character. So we check the value of
checker & (1 << val)
So with help of the AND property described above, if we get a 1 from this operation, then checker already had a 1 at that position, which means we must have encountered this character before. So we immediately return false.
That's it. If all our & checks return 0, we finally return true, meaning there were no character repititions.
1 << val
is the same as 2 to the degree of val
. So it's a number which has
just one one in its binary representation (the one is at position val+1
, if you count from
the right side of the number to the left one).
a |= b
means basically this: set in a
all binary flags/ones from the
binary representation of b
(and keep those in a
which were already set).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With