I have a UTF-8 encoded string that comes from an ajax response, I want to get substring of that string up to the first comma. For the string "Привет, мир"
it would be "Привет"
.
Will this work and not run into "multibyte-ness" issues?
var i = text.indexOf(',');
if (i != -1) text = text.substr(0, i);
Or is it better to use split
?
Javascript treats strings by characters, not by bytes.
As such, yes, that's fine from an encoding/string handling standpoint.
You may treat strings in Javascript as not having any particular encoding, but as a string of characters.
> "漢字".substr(1)
"字"
Note that the above is only a simplification though. As pointed out in the comments, Javascript treats strings as 16-bit code points. This enables you to treat strings "by character" for the majority of common characters, but for characters which are encoded with more than 2 bytes in UTF-16 or characters composed of more than one code point, this abstraction breaks down.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With