Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can you strip non-ASCII characters from a string? (in C#)

Tags:

c#

ascii

People also ask

How do I remove Unicode characters from a string?

In python, to remove Unicode ” u “ character from string then, we can use the replace() method to remove the Unicode ” u ” from the string. After writing the above code (python remove Unicode ” u ” from a string), Ones you will print “ string_unicode ” then the output will appear as a “ Python is easy. ”.

What is a non ASCII character?

Non-ASCII characters are those that are not encoded in ASCII, such as Unicode, EBCDIC, etc. ASCII is limited to 128 characters and was initially developed for the English language.


string s = "søme string";
s = Regex.Replace(s, @"[^\u0000-\u007F]+", string.Empty);

Here is a pure .NET solution that doesn't use regular expressions:

string inputString = "Räksmörgås";
string asAscii = Encoding.ASCII.GetString(
    Encoding.Convert(
        Encoding.UTF8,
        Encoding.GetEncoding(
            Encoding.ASCII.EncodingName,
            new EncoderReplacementFallback(string.Empty),
            new DecoderExceptionFallback()
            ),
        Encoding.UTF8.GetBytes(inputString)
    )
);

It may look cumbersome, but it should be intuitive. It uses the .NET ASCII encoding to convert a string. UTF8 is used during the conversion because it can represent any of the original characters. It uses an EncoderReplacementFallback to to convert any non-ASCII character to an empty string.


I believe MonsCamus meant:

parsememo = Regex.Replace(parsememo, @"[^\u0020-\u007E]", string.Empty);

If you want not to strip, but to actually convert latin accented to non-accented characters, take a look at this question: How do I translate 8bit characters into 7bit characters? (i.e. Ü to U)