Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Understanding bit vector usage in finding unique character in a string [duplicate]

Tags:

java

string

bit

I have the following code that finds unique characters in a string using bit vectors. We assume it's an ASCII char set with lower case letters only.

I am having a hard time understanding the use of bit vectors below. Even after debugging through the program and following the changes variables go through after each loop.

// assuming that the characters range from a-z
static boolean isUniqueBitVector(String str) {
    int checker = 0;

    for(int i = 0; i < str.length(); i++) {
        int val = str.charAt(i) - 'a';

        if((checker & (1 << val)) > 0) {
            return false;
        } else {
            checker |= (1 << val);
        }
    }
    return true;
}

What's the purpose of left shifting the val(int representation of each char in string) by 1 and AND'ing it with checker (initialized to 0) and OR'ing it in the else block.

like image 868
Pha3drus Avatar asked Dec 18 '22 09:12

Pha3drus


1 Answers

checker is a 32 bit integer, of which we assign the lowest 26 to store a flag for each letter a-z. We can use bits for this because a letter can be non-unique only once.

int val = str.charAt(i) - 'a'; computes the index of the bit that corresponds to the current letter. This is where the assumption that the input only contains a-z comes in. val will be a number between zero and 25.

In general, (1 << val) is the bit corresponding to the selected letter. << is the left shift operator. It is used to move 1, which only has a single bit on, val positions to the left. So for example, 1<<3 would be 8. In fact, left shifting is equivalent to multiplying by a power of 2.

(checker & (1 << val)) verifies if the selected bit is on or not. The & operator is called bitwise and. It combines two numbers and sets any bits that are on in both to on. Remember that 1 << val only has a single bit turned on. The only way the result of that and checker is going to be nonzero is if checker already has the same bit turned on. In that case we return false because a letter has been repeated.

In the case that a new letter has been found, we need to turn on the correct bit in checker. This is what checker |= (1 << val); does. The bitwise or operator, |, turns on a bit in the result if it is on in either operand. Again, 1 << val has only one bit on. Therefore the result of the or is to replicate checker and turn on that one bit (whether or not it was already on).

All of the operations you see here are very common idioms. Hopefully my explanation has helped you in some way, because you will almost certainly see very similar things in any simple bit twiddling code.

like image 50
Mad Physicist Avatar answered May 07 '23 01:05

Mad Physicist