I'm saving some strings from a third party into my database (postgres). Sometimes these strings are too long and need to be truncated to fit into the column in my table.
On some random occasions I accidentally truncate the string right where there is a Unicode character, which gives me a "broken" string that I cannot save into the database. I get the following error: Unable to translate Unicode character \uD83D at index XXX to specified code page
.
I've created a minimal example to show you what I mean. Here I have a string that contains a Unicode character ("Small blue diamond" 🔹 U+1F539). Depending on where I truncate, it gives me a valid string or not.
var myString = @"This is a string before an emoji:🔹 This is after the emoji.";
var brokenString = myString.Substring(0, 34);
// Gives: "This is a string before an emoji:☐"
var test3 = myString.Substring(0, 35);
// Gives: "This is a string before an emoji:🔹"
Is there a way for me to truncate the string without accidentally breaking any Unicode chars?
Make a loop at the end of the string After cutting the string at the proper length, take the end of the string and tie a knot at the very end, then fold the string over and tie a loop, about the same size as the original loop (about 2cm in diameter).
Unicode is the universal character encoding used to process, store and facilitate the interchange of text data in any language while ASCII is used for the representation of text such as symbols, letters, digits, etc. in computers.
Essentially, you check the length of the given string. If it's longer than a given length n , clip it to length n ( substr or slice ) and add html entity … (…) to the clipped string. function truncate( str, n, useWordBoundary ){ if (str. length <= n) { return str; } const subString = str.
A Unicode character may be represented with several char
s, that is the problem with string.Substring
you are having.
You may convert your string
to a StringInfo
object and then use SubstringByTextElements()
method to get the substring based on the Unicode character count, not a char
count.
See a C# demo:
Console.WriteLine("🔹".Length); // => 2
Console.WriteLine(new StringInfo("🔹").LengthInTextElements); // => 1
var myString = @"This is a string before an emoji:🔹This is after the emoji.";
var teMyString = new StringInfo(myString);
Console.WriteLine(teMyString.SubstringByTextElements(0, 33));
// => "This is a string before an emoji:"
Console.WriteLine(teMyString.SubstringByTextElements(0, 34));
// => This is a string before an emoji:🔹
Console.WriteLine(teMyString.SubstringByTextElements(0, 35));
// => This is a string before an emoji:🔹T
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With