Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

UTF-16 safe substring in C# .NET

I want to get a substring of a given length say 150. However, I want to make sure I don't cut off the string in between a unicode character.

e.g. see the following code:

var str = "Hello😀 world!";
var substr = str.Substring(0, 6);

Here substr is an invalid string since the smiley character is cut in half.

Instead I want a function that does as follows:

var str = "Hello😀 world!";
var substr = str.UnicodeSafeSubstring(0, 6);

where substr contains "Hello😀"

For reference, here is how I would do it in Objective-C using rangeOfComposedCharacterSequencesForRange

NSString* str = @"Hello😀 world!";
NSRange range = [message rangeOfComposedCharacterSequencesForRange:NSMakeRange(0, 6)];
NSString* substr = [message substringWithRange:range]];

What is the equivalent code in C#?

like image 708
Kostub Deshmukh Avatar asked Aug 11 '15 07:08

Kostub Deshmukh


People also ask

Why does .NET use UTF-16?

NET uses UTF-16 to encode the text in a string . A char instance represents a 16-bit code unit. A single 16-bit code unit can represent any code point in the 16-bit range of the Basic Multilingual Plane. But for a code point in the supplementary range, two char instances are needed.

What is the point of UTF-16?

UTF-16 allows all of the basic multilingual plane (BMP) to be represented as single code units. Unicode code points beyond U+FFFF are represented by surrogate pairs. The interesting thing is that Java and Windows (and other systems that use UTF-16) all operate at the code unit level, not the Unicode code point level.

Is C# string Unicode?

The equivalent in C# is the String class. According to MSDN: (A String) Represents text as a series of Unicode characters. So, if you do string str = "a string here"; , you have a Unicode string.


1 Answers

Looks like you're looking to split a string on graphemes, that is on single displayed characters.

In that case, you have a handy method: StringInfo.SubstringByTextElements:

var str = "Hello😀 world!";
var substr = new StringInfo(str).SubstringByTextElements(0, 6);
like image 119
Lucas Trzesniewski Avatar answered Oct 30 '22 03:10

Lucas Trzesniewski