On Windows Phone, I want to substring any given string to what's equivalent of 100 ASCII characters in length.
String.Length is obviously useless, as a Chinese string uses 3 bytes per character, a Danish string uses 2 or 4 bytes per character, and a Russian string uses 4 bytes per character.
The only available encoding are UTF-8 and UTF-16. So what do I do?
The idea is this:
private static string UnicodeSubstring(string text, int length)
{
var bytes = Encoding.UTF8.GetBytes(text);
return Encoding.UTF8.GetString(bytes, 0, Math.Min(bytes.Length, length));
}
But the length needs to be correctly dividable with the number of bytes used for each character, so the last character is always rendered correctly.
One option is to simply go through the string, computing the number of bytes for each character.
If you know you don't need to deal with characters outside the BMP, this is reasonably simple:
public string SubstringWithinUtf8Limit(string text, int byteLimit)
{
int byteCount = 0;
char[] buffer = new char[1];
for (int i = 0; i < text.Length; i++)
{
buffer[0] = text[i];
byteCount += Encoding.UTF8.GetByteCount(buffer);
if (byteCount > byteLimit)
{
// Couldn't add this character. Return its index
return text.Substring(0, i);
}
}
return text;
}
It becomes slightly trickier if you have to handle surrogate pairs :(
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With