If I have a string like "๐123๐จโ๐ฉโ๐งโ๐ฆ"
, how can I split it into an array, which would look like ["๐", "1", "2", "3", "๐จโ๐ฉโ๐งโ๐ฆ"]
? If I use ToCharArray()
the first Emoji is split into 2 characters and the second into 7 characters.
Update
The solution now looks like this:
public static List<string> GetCharacters(string text)
{
char[] ca = text.ToCharArray();
List<string> characters = new List<string>();
for (int i = 0; i < ca.Length; i++)
{
char c = ca[i];
if (c > โญ65535โฌ) continue;
if (char.IsHighSurrogate(c))
{
i++;
characters.Add(new string(new[] { c, ca[i] }));
}
else
characters.Add(new string(new[] { c }));
}
return characters;
}
Please note that, as mentioned in the comments, it doesn't work for the family emoji. It only works for emojis that have 2 characters or less. The output of the example would be: ["๐", "1", "2", "3", "๐จโ", "๐ฉโ", "๐งโ", "๐ฆ"]
Method 1: Split multiple characters from string using re. split() This is the most efficient and commonly used method to split multiple characters at once. It makes use of regex(regular expressions) in order to do this.
You can use String. Split() method with params char[] ; Returns a string array that contains the substrings in this instance that are delimited by elements of a specified Unicode character array.
It can represent all 1,114,112 Unicode characters. Most C code that deals with strings on a byte-by-byte basis still works, since UTF-8 is fully compatible with 7-bit ASCII. Characters usually require fewer than four bytes. String sort order is preserved.
Unicode is a standard encoding system that is used to represent characters from almost all languages. Every Unicode character is encoded using a unique integer code point between 0 and 0x10FFFF . A Unicode string is a sequence of zero or more code points.
.NET represents strings as a sequence of UTF-16 elements. Unicode code points outside the Base Multilingual Plane (BMP) will be split into a high and low surrogate. The lower 10 bits of each forms half of the real code point value.
There are helpers to detect these surrogates (eg. Char.IsLowSurrogate
).
You need to handle this yourself.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With