Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

substring a multibyte character safely c#

Tags:

string

c#

I'm trying to do a substring on a string containing multi byte characters, and I'm not getting the results I expect. I am trying to substring strings like 😂test. The first character is a 4 byte character so calling ToCharArray on this string returns:

  • 55357 #bytes 1 and 2 of the first character
  • 56384 #bytes 3 and 4 of the first character
  • 116 #t
  • 101 #e
  • 115 #s
  • 116 #t

So when I call .Substring(1) on this string, it returns an invalid string that starts with the third and fourth bytes of the first character, not 'test'. Is there any way to get .Substring and other string operations to treat that character as a single unit?

like image 323
Ceilingfish Avatar asked Apr 08 '14 10:04

Ceilingfish


People also ask

What is multibyte character C?

The term “multibyte character” is defined by ISO C to denote a byte sequence that encodes an ideogram, no matter what encoding scheme is employed. All multibyte characters are members of the “extended character set.” A regular single-byte character is just a special case of a multibyte character.

Is multibyte character?

A multibyte character is a character composed of sequences of one or more bytes. Each byte sequence represents a single character in the extended character set. Multibyte characters are used in character sets such as Kanji. Wide characters are multilingual character codes that are always 16 bits wide.

What is a multibyte string?

A null-terminated multibyte string (NTMBS), or "multibyte string", is a sequence of nonzero bytes followed by a byte with value zero (the terminating null character). Each character stored in the string may occupy more than one byte.

What is substring in C sharp?

The Substring() method in C# is used to retrieve a substring from this instance. The substring starts at a specified character position and continues to the end of the string.


1 Answers

You want to use StringInfo

        var yourstring = "😂test";
    StringInfo si = new StringInfo(yourstring);
    var substring = si.SubstringByTextElements(1);
like image 200
Sameer Avatar answered Oct 07 '22 21:10

Sameer