How to get the nth (Unicode) character from a string in JavaScript

Tags:

unicode

Suppose we have a string with some (astral) Unicode characters:

const s = 'Hi 👋 Unicode!'

The [] operator and .charAt() method don't work for getting the 4th character, which should be "👋":

> s[3]
'�'
> s.charAt(3)
'�'

The .codePointAt() does get the correct value for the 4th character, but unfortunately it's a number and has to be converted back to a string using String.fromCodePoint():

> String.fromCodePoint(s.codePointAt(3))
'👋'

Similarly, converting the string into an array using splats yields valid Unicode characters, so that's another way of getting the 4th one:

> [...s][3]
'👋'

But i can't believe that going from string to number back to string, or having to split the string into an array are the only ways of doing this seemingly trivial thing. Isn't there a simple method for doing this?

> s.simpleMethod(3)
'👋'

Note: i know that the definition of "character" is somewhat fuzzy, but for the purpose of this question a character is simply the symbol that corresponds to a Unicode codepoint (no combining characters, no grapheme clusters, etc).

Update: the String.fromCodePoint(str.codePointAt(n)) method is not really viable, since the nth position there doesn't take previous astral symbols into account: String.fromCodePoint('👋🙈'.codePointAt(1)) // => '�'

(I feel kinda dumb asking this; like i'm probably missing something obvious. But previous answers to this questions don't work on strings with Unicode simbols on astral planes.)

877

asked Sep 11 '17 14:09

epidemian

1 Answers

The string iterator is the only thing that iterates through code points rather than UCS-2/UTF-16 code units. So:

const string = 'Hi 👋 Unicode!';
for (const symbol of string) {
  console.log(symbol);
}

So to get a specific code point based on its index from a string:

const string = 'Hi 👋 Unicode!';
// Note: The spread operator uses the string iterator under the hood.
const symbols = [...string]; 
symbols[3]; // '👋'

Still, this would break with grapheme clusters, or emoji sequences such as 👨‍👩‍👧‍👦 (👨 + U+200D ZERO WIDTH JOINER + 👩 + U+200D ZERO WIDTH JOINER + 👧 + U+200D ZERO WIDTH JOINER + 👦). Text segmentation helps with that.

Do you actually need to get the 4th code point in the string, though? What’s your use case?

answered Sep 20 '22 17:09

Mathias Bynens

Related questions
                            
                                jsHint "myFunction is defined but never used"
                            
                                TypeError: 'undefined' is not a function (evaluating 'sinon.spy()')
                            
                                Difference between RequireJS and CommonJS
                            
                                Javascript form.submit() not working in Firefox
                            
                                Multiple optional parameters with angular ui-router
                            
                                Is it possible to use ng-pattern with a variable
                            
                                Why don't all the provinces of Pakistan get colored green?
                            
                                AngularJS: Inject controller inside another controller from the same module
                            
                                What is :: before this keyword in React JS?
                            
                                How to convert selected HTML to Json?
                            
                                How to set HTML lang attribute dynamically?
                            
                                Chart Js Cannot read property 'length' of undefined
                            
                                Is it possible to intercept and cache WebSocket messages in a Service Worker like all the examples do for normal HTTP requests?
                            
                                Java Date timezone printing different timezones for different years, Workaround needed
                            
                                Angular 2 Focus on first invalid input after Click/Event
                            
                                Inline invert boolean @click in Vue.js
                            
                                flatten nested object using lodash
                            
                                Requiring a JavaScript Node.js module in TypeScript (allowJs' is not set)
                            
                                REACT fetch post request
                            
                                Firebase get Download URL after successful image upload to firebase storage

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With