Answer: You just have to pass (โโ) in the regEx section of the Java Split() method. This will split the entire String into individual characters.
The split() method splits a string into a list. You can specify the separator, default separator is any whitespace. Note: When maxsplit is specified, the list will contain the specified number of elements plus one.
You can split on an empty string:
var chars = "overpopulation".split('');
If you just want to access a string in an array-like fashion, you can do that without split
:
var s = "overpopulation";
for (var i = 0; i < s.length; i++) {
console.log(s.charAt(i));
}
You can also access each character with its index using normal array syntax. Note, however, that strings are immutable, which means you can't set the value of a character using this method, and that it isn't supported by IE7 (if that still matters to you).
var s = "overpopulation";
console.log(s[3]); // logs 'r'
Old question but I should warn:
.split('')
You'll get weird results with non-BMP (non-Basic-Multilingual-Plane) character sets.
Reason is that methods like .split()
and .charCodeAt()
only respect the characters with a code point below 65536; bec. higher code points are represented by a pair of (lower valued) "surrogate" pseudo-characters.
'๐๐๐'.length // โ> 6
'๐๐๐'.split('') // โ> ["๏ฟฝ", "๏ฟฝ", "๏ฟฝ", "๏ฟฝ", "๏ฟฝ", "๏ฟฝ"]
'๐'.length // โ> 2
'๐'.split('') // โ> ["๏ฟฝ", "๏ฟฝ"]
Using the spread operator:
let arr = [...str];
Or Array.from
let arr = Array.from(str);
Or split
with the new u
RegExp flag:
let arr = str.split(/(?!$)/u);
Examples:
[...'๐๐๐'] // โ> ["๐", "๐", "๐"]
[...'๐๐๐'] // โ> ["๐", "๐", "๐"]
I came up with this function that internally uses MDN example to get the correct code point of each character.
function stringToArray() {
var i = 0,
arr = [],
codePoint;
while (!isNaN(codePoint = knownCharCodeAt(str, i))) {
arr.push(String.fromCodePoint(codePoint));
i++;
}
return arr;
}
This requires knownCharCodeAt()
function and for some browsers; a String.fromCodePoint()
polyfill.
if (!String.fromCodePoint) {
// ES6 Unicode Shims 0.1 , ยฉ 2012 Steven Levithan , MIT License
String.fromCodePoint = function fromCodePoint () {
var chars = [], point, offset, units, i;
for (i = 0; i < arguments.length; ++i) {
point = arguments[i];
offset = point - 0x10000;
units = point > 0xFFFF ? [0xD800 + (offset >> 10), 0xDC00 + (offset & 0x3FF)] : [point];
chars.push(String.fromCharCode.apply(null, units));
}
return chars.join("");
}
}
Examples:
stringToArray('๐๐๐') // โ> ["๐", "๐", "๐"]
stringToArray('๐๐๐') // โ> ["๐", "๐", "๐"]
Note: str[index]
(ES5) and str.charAt(index)
will also return weird results with non-BMP charsets. e.g. '๐'.charAt(0)
returns "๏ฟฝ"
.
UPDATE: Read this nice article about JS and unicode.
It's as simple as:
s.split("");
The delimiter is an empty string, hence it will break up between each single character.
.split('') would split emojis in half.
Onur's solutions and the regex's proposed work for some emojis, but can't handle more complex languages or combined emojis. Consider this emoji being ruined:
[..."๐ณ๏ธโ๐"] // returns ["๐ณ", "๏ธ", "โ", "๐"] instead of ["๐ณ๏ธโ๐"]
Also consider this Hindi text "เค เคจเฅเคเฅเคเฅเคฆ" which is split like this:
[..."เค
เคจเฅเคเฅเคเฅเคฆ"] // returns ["เค
", "เคจ", "เฅ", "เค", "เฅ", "เค", "เฅ", "เคฆ"]
but should in fact be split like this:
["เค
","เคจเฅ","เคเฅ","เคเฅ","เคฆ"]
because some of the characters are combining marks (think diacritics/accents in European languages).
You can use the grapheme-splitter library for this:
https://github.com/orling/grapheme-splitter
It does proper standards-based letter split in all the hundreds of exotic edge-cases - yes, there are that many.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With