Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Bit shifting and bit mask - sample code

I've come across some code which has the bit masks 0xff and 0xff00 or in 16 bit binary form 00000000 11111111 and 11111111 00000000.

/**
 * Function to check if the given string is in GZIP Format.
 *
 * @param inString String to check.
 * @return True if GZIP Compressed otherwise false.
 */
public static boolean isStringCompressed(String inString)
{
    try
    {
        byte[] bytes = inString.getBytes("ISO-8859-1");
        int gzipHeader = ((int) bytes[0] & 0xff)
            | ((bytes[1] << 8) & 0xff00);
        return GZIPInputStream.GZIP_MAGIC == gzipHeader;
    } catch (Exception e)
    {
        return false;
    }
}

I'm trying to work out what the purpose of using these bit masks in this context (against a byte array). I can't see what difference it would make?

In the context of a GZip compressed string as this method seems to be written for the GZip magic number is 35615, 8B1F in Hex and 10001011 00011111 in binary.

Am I correct in thinking this swaps the bytes? So for example say my input string were \u001f\u008b

bytes[0] & 0xff00
 bytes[0] = 1f = 00011111
          & ff = 11111111
                 --------
               = 00011111

bytes[1] << 8
 bytes[1] = 8b = 10001011
          << 8 = 10001011 00000000

((bytes[1] << 8) & 0xff00)
= 10001011 00000000 & 0xff00
= 10001011 00000000 
  11111111 00000000 &
-------------------
  10001011 00000000

So

00000000 00011111
10001011 00000000 |
-----------------
10001011 00011111 = 8B1F

To me it doesn't seem like the & is doing anything to the original byte in both cases bytes[0] & 0xff and (bytes[1] << 8) & 0xff00). What am I missing?

like image 282
PDStat Avatar asked Jan 08 '23 08:01

PDStat


2 Answers

int gzipHeader = ((int) bytes[0] & 0xff) | ((bytes[1] << 8) & 0xff00);

The type byte is Java is signed. If you cast a byte to an int, its sign will be extended. The & 0xff is to mask out the 1 bits that you get from sign extension, effectively treating the byte as if it is unsigned.

Likewise for 0xff00, except that the byte is first shifted 8 bits to the left.

So, what this does is:

  • take the first byte, bytes[0], cast it to int and mask out the sign-extended bits (treating the byte as if it is unsigned)
  • take the second byte, cast it to int, shift it left by 8 bits, and mask out the sign-extended bits
  • combine the values with |

Note that the shift left effectively swaps the bytes.

like image 130
Jesper Avatar answered Jan 14 '23 11:01

Jesper


Apparently the purpose is to read the first word of bytes and store them in gzipHeader by suitable masking and shifting. More precisely, the first part masks out exactly the first byte while the second part masks out the second byte, already shifted by 8 bits. The | combines both bit masks to an int.

The resulting value is compared against the defined value GZIPInputStream.GZIP_MAGIC to determine if the first two bytes are the defined beginning of data compressed with gzip.

like image 32
Codor Avatar answered Jan 14 '23 11:01

Codor