How do I to convert different Unicode characters to their closest ASCII equivalents? Like Ä -> A. I googled but didn't find any suitable solution. The trick Encoding.ASCII.GetBytes("Ä")[0]
didn't work. (Result was ?
).
I found that there is a class Encoder
that has a Fallback
property that is exactly for cases when char
can't be converted, but implementations (EncoderReplacementFallback
) are stupid and convert to ?
.
Any ideas?
Save this answer. Show activity on this post. You CAN'T convert from Unicode to ASCII. Almost every character in Unicode cannot be expressed in ASCII, and those that can be expressed have exactly the same codepoints in ASCII as in UTF-8, which is probably what you have.
Python ord() function returns the Unicode code from a given character. This function accepts a string of unit length as an argument and returns the Unicode equivalence of the passed argument.
It is obvious by now that Unicode represents far more characters than ASCII. ASCII uses a 7-bit range to encode just 128 distinct characters. Unicode on the other hand encodes 154 written scripts.
The characters can be emojis, alphabets, Greek symbols, etc. ASCII and Unicode are two popular encoding schemes. ASCII encodes symbols, digits, letters, etc., whereas Unicode encodes special texts from different languages, letters, symbols, etc.
If it is just removing of the diacritical marks, then head to this answer:
static string RemoveDiacritics(string stIn) {
string stFormD = stIn.Normalize(NormalizationForm.FormD);
StringBuilder sb = new StringBuilder();
for(int ich = 0; ich < stFormD.Length; ich++) {
UnicodeCategory uc = CharUnicodeInfo.GetUnicodeCategory(stFormD[ich]);
if(uc != UnicodeCategory.NonSpacingMark) {
sb.Append(stFormD[ich]);
}
}
return(sb.ToString().Normalize(NormalizationForm.FormC));
}
MS Dynamics has a problem where it won't allow for any character outside of x20 to x7f and some characters within that range are also invalid. My answer was to create an array keyed to the invalid characters returning the best guess of the valid characters.
It ain't pretty, but it works.
Function PlainAscii(InText)
Dim i, c, a
Const cUTF7 = "^[\x20-\x7e]+$"
Const IgnoreCase = False
PlainAscii = ""
If InText = "" Then Exit Function
If RegExTest(InText, cUTF7, IgnoreCase) Then
PlainAscii = InText
Else
For i = 1 To Len(InText)
c = Mid(InText, i, 1)
a = Asc(c)
If a = 10 Or a = 13 Or a = 9 Then
' Do Nothing - Allow LF, CR & TAB
ElseIf a < 32 Then
c = " "
ElseIf a > 126 Then
c = CvtToAscii(a)
End If
PlainAscii = PlainAscii & c
Next
End If
End Function
Function CvtToAscii(inChar)
' Maps The Characters With The 8th Bit Set To 7 Bit Characters
Dim arrChars
arrChars = Array(" ", " ", "$", " ", ",", "f", """", " ", "t", "t", "^", "%", "S", "<", "O", " ", "Z", " ", " ", "'", "'", """", """", ".", "-", "-", "~", "T", "S", ">", "o", " ", "Z", "Y", " ", "!", "$", "$", "o", "$", "|", "S", " ", "c", " ", " ", " ", "_", "R", "_", ".", " ", " ", " ", " ", "u", "P", ".", ",", "i", " ", " ", " ", " ", " ", " ", "A", "A", "A", "A", "A", "A", "A", "C", "E", "E", "E", "E", "I", "I", "I", "I", "D", "N", "O", "O", "O", "O", "O", "X", "O", "U", "U", "U", "U", "Y", "b", "B", "a", "a", "a", "a", "a", "a", "a", "c", "e", "e", "e", "e", "i", "i", "i", "i", "o", "n", "o", "o", "o", "o", "o", "/", "O", "u", "u", "u", "u", "y", "p", "y")
CvtToAscii = arrChars(inChar - 127)
End Function
Function RegExTest(ByVal strStringToSearch, strExpression, IgnoreCase)
Dim objRegEx
On Error Resume Next
Err.Clear
strStringToSearch = Replace(Replace(strStringToSearch, vbCr, ""), vbLf, "")
RegExTest = False
Set objRegEx = New RegExp
With objRegEx
.Pattern = strExpression '//the reg expression that should be searched for
If Err.Number = 0 Then
.IgnoreCase = CBool(IgnoreCase) '//not case sensitive
.Global = True '//match all instances of pattern
RegExTest = .Test(strStringToSearch)
End If
End With
Set objRegEx = Nothing
On Error Goto 0
End Function
Your answer is necessarily going to be different.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With