Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Null character in strings

Tags:

Consider this string:

var s = "A\0Z";

Its length is 3, as given by s.length. Using console.log you can see the string isn't cut and that s[1] is "" and s.charCodeAt(1) is 0.

When you alert it in Firefox, you see AZ. When you alert it in Chrome/Linux using alert(s), the \0 terminates the string and you see A.

My question is: what should browsers and Javascript engines do? Is Chrome buggy here? Is there a document defining what should happen?

As this is a question about standard, a reference is needed.

like image 959
Denys Séguret Avatar asked Dec 04 '12 08:12

Denys Séguret


People also ask

How do you write a null character?

In caret notation the null character is ^@ . On some keyboards, one can enter a null character by holding down Ctrl and pressing @ (on US layouts just Ctrl + 2 will often work, there is no need for ⇧ Shift to get the @ sign).

Why do strings end with null character?

Because a null takes one byte, whereas storing the length of the string with the string itself could take multiple bytes. Memory was scarce back in the day, so the smaller solution won out.

What is the symbol for null character?

Null character, U+0000, U+2400 "symbol for null" (␀), a single-character glyph "NUL" Null sign (∅), the empty set.

What is null byte in string?

A null-terminated byte string (NTBS) is a sequence of nonzero bytes followed by a byte with value zero (the terminating null character). Each byte in a byte string encodes one character of some character set.


1 Answers

What the browser should do is keep track of the string and its length separately since there are no null terminators present in the standard. (A string is just an object with a length).

What Chrome seems to do (I am taking your word for this) is use the standard C string functions which terminate at a \0. To answer one of your questions: Yes this to me constitutes a bug in Chrome's handling of the alert() function.

Formally the spec says:

A string literal is zero or more characters enclosed in single or double quotes. Each character may be represented by an escape sequence. All characters may appear literally in a string literal except for the closing quote character, backslash, carriage return, line separator, paragraph separator, and line feed. Any character may appear in the form of an escape sequence.

Also:

A string literal stands for a value of the String type. The String value (SV) of the literal is described in terms of character values (CV) contributed by the various parts of the string literal.

And regarding the NUL byte:

The CV [Character Value] of EscapeSequence :: 0 [lookahead ∉ DecimalDigit] is a <NUL> character (Unicode value 0000).

Therefore, a NUL byte should simply be "yet another character value" and have no special meaning, as opposed to other languages where it might end a SV (String value).

For Reference of (valid) "String Single Character Escape Sequences" have a look at the ECMAScript Language spec section 7.8.4. There is a table at the end of the paragraph listing the aforementioned escape sequences.

What someone aiming to write a Javascript engine could probably learn from this: Don't use C/C++ string functions. :)

like image 76
Gung Foo Avatar answered Oct 20 '22 11:10

Gung Foo