How does this print "hello world"?

5-bit codification

For 5 bits, it is posible to represent 2⁵ = 32 characters. English alphabet contains 26 letters, this leaves room for 32 - 26 = 6 symbols apart from letters. With this codification scheme you can have all 26 (one case) english letters and 6 symbols (being space among them).

Algorithm description

The >>= 5 in the for-loop jumps from group to group, then the 5-bits group gets isolated ANDing the number with the mask 31₁₀ = 11111₂ in the sentence l & 31

Now the code maps the 5-bit value to its corresponding 7-bit ascii character. This is the tricky part, check the binary representations for the lowercase alphabet letters in the following table:

  ascii   |     ascii     |    ascii     |    algorithm
character | decimal value | binary value | 5-bit codification 
--------------------------------------------------------------
  space   |       32      |   0100000    |      11111
    a     |       97      |   1100001    |      00001
    b     |       98      |   1100010    |      00010
    c     |       99      |   1100011    |      00011
    d     |      100      |   1100100    |      00100
    e     |      101      |   1100101    |      00101
    f     |      102      |   1100110    |      00110
    g     |      103      |   1100111    |      00111
    h     |      104      |   1101000    |      01000
    i     |      105      |   1101001    |      01001
    j     |      106      |   1101010    |      01010
    k     |      107      |   1101011    |      01011
    l     |      108      |   1101100    |      01100
    m     |      109      |   1101101    |      01101
    n     |      110      |   1101110    |      01110
    o     |      111      |   1101111    |      01111
    p     |      112      |   1110000    |      10000
    q     |      113      |   1110001    |      10001
    r     |      114      |   1110010    |      10010
    s     |      115      |   1110011    |      10011
    t     |      116      |   1110100    |      10100
    u     |      117      |   1110101    |      10101
    v     |      118      |   1110110    |      10110
    w     |      119      |   1110111    |      10111
    x     |      120      |   1111000    |      11000
    y     |      121      |   1111001    |      11001
    z     |      122      |   1111010    |      11010

Here you can see that the ascii characters we want to map begin with the 7th and 6th bit set (11xxxxx₂) (except for space, which only has the 6th bit on), you could OR the 5-bit codification with 96 (96₁₀ = 1100000₂) and that should be enough to do the mapping, but that wouldn't work for space (darn space!)

Now we know that special care has to be taken to process space at the same time as the other characters. To achieve this, the code turns the 7th bit on (but not the 6th) on the extracted 5-bit group with an OR 64 64₁₀ = 1000000₂ (l & 31 | 64).

So far the 5-bit group is of the form: 10xxxxx₂ (space would be 1011111₂ = 95₁₀). If we can map space to 0 unaffecting other values, then we can turn the 6th bit on and that should be all. Here is what the mod 95 part comes to play, space is 1011111₂ = 95₁₀, using the mod operation (l & 31 | 64) % 95) only space goes back to 0, and after this, the code turns the 6th bit on by adding 32₁₀ = 100000₂ to the previous result, ((l & 31 | 64) % 95) + 32) transforming the 5-bit value into a valid ascii character

isolates 5 bits --+          +---- takes 'space' (and only 'space') back to 0
                  |          |
                  v          v
               (l & 31 | 64) % 95) + 32
                       ^           ^ 
       turns the       |           |
      7th bit on ------+           +--- turns the 6th bit on

The following code does the inverse process, given a lowercase string (max 12 chars), returns the 64 bit long value that could be used with the OP's code:

public class D {
    public static void main(String... args) {
        String v = "hello test";
        int len = Math.min(12, v.length());
        long res = 0L;
        for (int i = 0; i < len; i++) {
            long c = (long) v.charAt(i) & 31;
            res |= ((((31 - c) / 31) * 31) | c) << 5 * i;
        }
        System.out.println(res);
    }
}

Adding some value to above answers. Following groovy script prints intermediate values.

String getBits(long l) {
return Long.toBinaryString(l).padLeft(8,'0');
}

for (long l = 4946144450195624l; l > 0; l >>= 5){
    println ''
    print String.valueOf(l).toString().padLeft(16,'0')
    print '|'+ getBits((l & 31 ))
    print '|'+ getBits(((l & 31 | 64)))
    print '|'+ getBits(((l & 31 | 64)  % 95))
    print '|'+ getBits(((l & 31 | 64)  % 95 + 32))

    print '|';
    System.out.print((char) (((l & 31 | 64) % 95) + 32));
}

Here it is

4946144450195624|00001000|01001000|01001000|01101000|h
0154567014068613|00000101|01000101|01000101|01100101|e
0004830219189644|00001100|01001100|01001100|01101100|l
0000150944349676|00001100|01001100|01001100|01101100|l
0000004717010927|00001111|01001111|01001111|01101111|o
0000000147406591|00011111|01011111|00000000|00100000| 
0000000004606455|00010111|01010111|01010111|01110111|w
0000000000143951|00001111|01001111|01001111|01101111|o
0000000000004498|00010010|01010010|01010010|01110010|r
0000000000000140|00001100|01001100|01001100|01101100|l
0000000000000004|00000100|01000100|01000100|01100100|d

Interesting!

Standard ASCII characters which are visible are in range of 32 to 127.

That's why you see 32, and 95 (127 - 32) there.

In fact each character is mapped to 5 bits here, (you can find what is 5 bit combination for each character), and then all bits are concatenated to form a large number.

Positive longs are 63 bit numbers, large enough to hold encrypted form of 12 characters. So it is large enough to hold Hello word, but for larger texts you shall use larger numbers, or even a BigInteger.

In an application we wanted to transfer visible English Characters, Persian Characters and Symbols via SMS. As you see there are 32 (number of Persian chars) + 95 (number of English characters and standard visible symbols) = 127 possible values, which can be represented with 7 bits.

We converted each UTF-8 (16 bit) character to 7 bits, and gain more than 56% compression ratio. So we could send texts with twice length in the same number of SMSs. (It is somehow the same thing happened here).

You are getting a result which happens to be char representation of below values

104 -> h
101 -> e
108 -> l
108 -> l
111 -> o
32  -> (space)
119 -> w
111 -> o
114 -> r
108 -> l
100 -> d

You've encoded characters as 5-bit values and packed 11 of them into a 64 bit long.

(packedValues >> 5*i) & 31 is the i-th encoded value with a range 0-31.

The hard part, as you say, is encoding the space. The lower case english letters occupy the contiguous range 97-122 in Unicode (and ascii, and most other encodings), but the space is 32.

To overcome this, you used some arithmetic. ((x+64)%95)+32 is almost the same as x + 96 (note how bitwise OR is equivalent to addition, in this case), but when x=31, we get 32.

Related questions
                            
                                Java current machine name and logged in user?
                            
                                Mockito: InvalidUseOfMatchersException
                            
                                How to sort List of objects by some property
                            
                                Can you split a stream into two streams?
                            
                                Maven: best way of linking custom external JAR to my project?
                            
                                Finding Key associated with max Value in a Java Map
                            
                                Regex doesn't work in String.matches()
                            
                                In Java, how do I convert a byte array to a string of hex digits while keeping leading zeros? [duplicate]
                            
                                How do I prevent the modification of a private field in a class?
                            
                                Java Enum definition
                            
                                Do interfaces inherit from Object class in java
                            
                                What is the difference between Unidirectional and Bidirectional JPA and Hibernate associations?
                            
                                Uses for the Java Void Reference Type?
                            
                                How to properly match varargs in Mockito
                            
                                How to remove a key from HashMap while iterating over it? [duplicate]
                            
                                What is the 'instanceof' operator used for in Java?
                            
                                What are the pros and cons of performing calculations in sql vs. in your application
                            
                                How to serialize a lambda?
                            
                                How do I convert a byte array to Base64 in Java?
                            
                                How does lombok work?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How does this print "hello world"?

Tags:

java

string

bit-shift

People also ask

5-bit codification

Algorithm description

Recent Activity

Donate For Us