Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is "👍".length === 2?

How does any textarea in my browser handle a seemingly 2 chars represented as one?

For example:

"👍".length
// -> 2

More examples here: https://jsbin.com/zazexenigi/edit?js,console

like image 974
filype Avatar asked Jul 13 '16 07:07

filype


People also ask

How do I return the first letter of a string?

To get the first and last characters of a string, access the string at the first and last indexes. For example, str[0] returns the first character, whereas str[str. length - 1] returns the last character of the string.

How do you read the first character in a string?

The charAt() method returns the character at a specified index (position) in a string. The index of the first character is 0, the second 1, ...

How do you get the first character of a string in typescript?

charAt() is a method that returns the character from the specified index. Characters in a string are indexed from left to right. The index of the first character is 0, and the index of the last character in a string, called stringName, is stringName.

How do you get the first letter of a string in react native?

To get first letter of each word in react js, just use string. split(' '). map(i => i. charAt(0)) it will split your string with space ( you can put any other separator also ), and help of map() and charAt() method it will return array of first letter of each word.


2 Answers

Javascript uses UTF-16 (source) to manage strings.

In UTF-16 there are 1,112,064 possible characters. Now, each character uses code points to be represented(*). In UTF-16 one code-point use two bytes (16 bits) to be saved. This means that with one code point you can have only 65536 different characters.

This means some characters has to be represented with two code points.

String.length() returns the number of code units in the string, not the number of characters.

MDN explains quite well the thing on the page about String.length()

This property returns the number of code units in the string. UTF-16, the string format used by JavaScript, uses a single 16-bit code unit to represent the most common characters, but needs to use two code units for less commonly-used characters, so it's possible for the value returned by length to not match the actual number of characters in the string.

(*): Actually some chars, in the range 010000 – 03FFFF and 040000 – 10FFFF can use up to 4 bytes (32 bits) per code point, but this doesn't change the answer: some chars requires more than 2 bytes to be represented, so they need more than 1 code point.

This means that some chars that need more than 16 bits have a length of 1 anyway. Like 0x03FFFF, it needs 21 bits, but it uses only one code unit in UTF-16, so its String.length is 1.

console.log(String.fromCharCode(0x03FFFF).length)
like image 82
rpadovani Avatar answered Sep 20 '22 03:09

rpadovani


I believe rpadovani answered your "why" question best, but for an implementation that will get you a proper glyph count in this situation, Lodash has tacked this problem in their toArray module.

For example,

_.toArray('12👪').length; // --> 3

Or, if you want to knock a few arbitrary characters off a string, you manipulate and rejoin the array, like:

_.toArray("👪trimToEightGlyphs").splice(0,8).join(''); // --> '👪trimToE'
like image 29
Evan Rusackas Avatar answered Sep 23 '22 03:09

Evan Rusackas