Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Bitwise operations on strings in javascript

In javascript the following test of character to character binary operations prints 0 676 times:

var s = 'abcdefghijklmnopqrstuvwxyz';
var i, j;
for(i=0; i<s.length;i++){ for(j=0; j<s.length;j++){ console.log(s[i] | s[j]) }};

If js was using the actual binary representation of the strings I would expect some non-zero values here.

Similarly, testing binary operations on strings and integers, the following print 26 255s and 0s, respectively. (255 was chosen because it is 11111111 in binary).

var s = 'abcdefghijklmnopqrstuvwxyz';
var i; for(i=0; i<s.length;i++){ console.log(s[i] | 255) }
var i; for(i=0; i<s.length;i++){ console.log(s[i] & 255) }

What is javascript doing here? It seems like javascript is casting any string to false before binary operations.

Notes

If you try this in python, it throws an error:

>>> s = 'abcdefghijklmnopqrstuvwxyz'
>>> [c1 | c2 for c2 in s for c1 in s]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for |: 'str' and 'str'

But stuff like this seems to work in php.

like image 552
turtlemonvh Avatar asked Mar 05 '14 23:03

turtlemonvh


1 Answers

In JavaScript, when a string is used with a binary operator it is first converted to a number. Relevant portions of the ECMAScript spec are shown below to explain how this works.

Bitwise operators:

The production A : A @ B, where @ is one of the bitwise operators in the productions above, is evaluated as follows:

  1. Let lref be the result of evaluating A.
  2. Let lval be GetValue(lref).
  3. Let rref be the result of evaluating B.
  4. Let rval be GetValue(rref).
  5. Let lnum be ToInt32(lval).
  6. Let rnum be ToInt32(rval).
  7. Return the result of applying the bitwise operator @ to lnum and rnum. The result is a signed 32 bit integer.

ToInt32:

The abstract operation ToInt32 converts its argument to one of 232 integer values in the range −231 through 231−1, inclusive. This abstract operation functions as follows:

  1. Let number be the result of calling ToNumber on the input argument.
  2. If number is NaN, +0, −0, +∞, or −∞, return +0.
  3. Let posInt be sign(number) * floor(abs(number)).
  4. Let int32bit be posInt modulo 232; that is, a finite integer value k of Number type with positive sign and less than 232 in magnitude such that the mathematical difference of posInt and k is mathematically an integer multiple of 232.
  5. If int32bit is greater than or equal to 231, return int32bit − 232, otherwise return int32bit.

The internal ToNumber function will return NaN for any string that cannot be parsed as a number, and ToInt32(NaN) will give 0. So in your code example all of the bitwise operators with letters as the operands will evaluate to 0 | 0, which explains why only 0 is printed.

Note that something like '7' | '8' will evaluate to 7 | 8 because in this case the strings used as the operands can be successfully convered to a number.

As for why the behavior in Python is different, there isn't really any implicit type conversion in Python so an error is expected for any type that doesn't implement the binary operators (by using __or__, __and__, etc.), and strings do not implement those binary operators.

Perl does something completely different, bitwise operators are implemented for strings and it will essentially perform the bitwise operator for the corresponding bytes from each string.

If you want to use JavaScript and get the same result as Perl, you will need to first convert the characters to their code points using str.charCodeAt, perform the bitwise operator on the resulting integers, and then use String.fromCodePoint to convert the resulting numeric values into characters.

like image 119
Andrew Clark Avatar answered Oct 08 '22 11:10

Andrew Clark