Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reading bytes from a JavaScript string

Tags:

javascript

People also ask

How do you find the bytes of a string?

The getBytes() method encodes a given String into a sequence of bytes and returns an array of bytes. The method can be used in below two ways: public byte[] getBytes(String charsetName) : It encodes the String into sequence of bytes using the specified charset and return the array of those bytes.

How many bytes is a string in JavaScript?

The size of a JavaScript string is. Always 2 bytes per character.

What is a byte in JavaScript?

The byte() function in p5. js is used to convert the given string of number, number value or boolean into its byte representation. This byte number can only be whole number in between -128 to 127.

How much memory does a string use JavaScript?

The string in JavaScript is treated as a sequence of UTF-16 code units and some character takes up 16 bits memory space and others 32 bits.


I believe that you can can do this with relatively simple bit operations:

function stringToBytes ( str ) {
  var ch, st, re = [];
  for (var i = 0; i < str.length; i++ ) {
    ch = str.charCodeAt(i);  // get char 
    st = [];                 // set up "stack"
    do {
      st.push( ch & 0xFF );  // push byte to stack
      ch = ch >> 8;          // shift value down by 1 byte
    }  
    while ( ch );
    // add stack contents to result
    // done because chars have "wrong" endianness
    re = re.concat( st.reverse() );
  }
  // return an array of bytes
  return re;
}

stringToBytes( "A\u1242B\u4123C" );  // [65, 18, 66, 66, 65, 35, 67]

It should be a simple matter to sum the output up by reading the byte array as if it were memory and adding it up into larger numbers:

function getIntAt ( arr, offs ) {
  return (arr[offs+0] << 24) +
         (arr[offs+1] << 16) +
         (arr[offs+2] << 8) +
          arr[offs+3];
}

function getWordAt ( arr, offs ) {
  return (arr[offs+0] << 8) +
          arr[offs+1];
}

'\\u' + getWordAt( stringToBytes( "A\u1242" ), 1 ).toString(16);  // "1242"

Borgar's answer seems correct.

Just wanted to clarify one point. Javascript treats bitwise operations as '32-bit signed int's, where the last (left-most) bit is the sign bit. Ie,

getIntAt([0x7f,0,0,0],0).toString(16)  //  "7f000000"

getIntAt([0x80,0,0,0],0).toString(16)  // "-80000000"

However, for octet-data processing (eg, network stream, etc), usually want the 'unsigned int' representation. This can be accomplished by adding a '>>> 0' (zero-fill right-shift) operator which internally tells Javascript to treat this as unsigned.

function getUIntAt ( arr, offs ) {
  return (arr[offs+0] << 24) +
         (arr[offs+1] << 16) +
         (arr[offs+2] << 8) +
          arr[offs+3] >>> 0;
}

getUIntAt([0x80,0,0,0],0).toString(16)   // "80000000"

There are two methods for encoding and decoding utf-8 string to a byte array and back.

var utf8 = {}

utf8.toByteArray = function(str) {
    var byteArray = [];
    for (var i = 0; i < str.length; i++)
        if (str.charCodeAt(i) <= 0x7F)
            byteArray.push(str.charCodeAt(i));
        else {
            var h = encodeURIComponent(str.charAt(i)).substr(1).split('%');
            for (var j = 0; j < h.length; j++)
                byteArray.push(parseInt(h[j], 16));
        }
    return byteArray;
};

utf8.parse = function(byteArray) {
    var str = '';
    for (var i = 0; i < byteArray.length; i++)
        str +=  byteArray[i] <= 0x7F?
                byteArray[i] === 0x25 ? "%25" : // %
                String.fromCharCode(byteArray[i]) :
                "%" + byteArray[i].toString(16).toUpperCase();
    return decodeURIComponent(str);
};

// sample
var str = "Да!";
var ba = utf8.toByteArray(str);
alert(ba);             // 208, 148, 208, 176, 33
alert(ba.length);      // 5
alert(utf8.parse(ba)); // Да!

While @Borgar answers the question correctly, his solution is pretty slow. It took me a while to track it down (I used his function somewhere in a larger project), so I thought I would share my insight.

I ended up having something like @Kadm. It's not some little percent faster, it's like 500 times faster (no exaggeration!). I wrote a little benchmark, so you can see it for yourself :)

function stringToBytesFaster ( str ) { 
var ch, st, re = [], j=0;
for (var i = 0; i < str.length; i++ ) { 
    ch = str.charCodeAt(i);
    if(ch < 127)
    {
        re[j++] = ch & 0xFF;
    }
    else
    {
        st = [];    // clear stack
        do {
            st.push( ch & 0xFF );  // push byte to stack
            ch = ch >> 8;          // shift value down by 1 byte
        }
        while ( ch );
        // add stack contents to result
        // done because chars have "wrong" endianness
        st = st.reverse();
        for(var k=0;k<st.length; ++k)
            re[j++] = st[k];
    }
}   
// return an array of bytes
return re; 
}

Borga's solution works perfectly. In case you want a more concrete implementation, you may want to have a look at the BinaryReader class from vjeux (which, for the records, is based on the binary-parser class from Jonas Raoni Soares Silva).