Removing control characters from a UTF-8 string

Tags:

I found this question but it removes all valid utf-8 characters also (returns me a blank string, while there are valid utf-8 characters plus control characters). As I read about utf-8, there's not a specific range for control characters and each character set has its own control characters.

How can I modify above solution to only remove control characters ?

642

asked Jul 23 '11 09:07

Xaqron

2 Answers

This is how I roll:

Regex.Replace(evilWeirdoText, @"[\u0000-\u001F]", string.Empty)

This strips out all the first 31 control characters. The next hex value up from \u001F is \u0020 AKA the space. Everything before space is all the line feed and null nonsense.

To believe me on the characters: http://donsnotes.com/tech/charsets/ascii.html

169

answered Sep 21 '22 08:09

BritishDeveloper

I think the following code will work for you:

public static string RemoveControlCharacters(string inString)
{
    if (inString == null) return null;
    StringBuilder newString = new StringBuilder();
    char ch;
    for (int i = 0; i < inString.Length; i++)
    {
        ch = inString[i];
        if (!char.IsControl(ch))
        {
            newString.Append(ch);
        }
    }
    return newString.ToString();
}

answered Sep 21 '22 08:09

Centro

Related questions
                            
                                CheckedListBox allowing only one item to be checked
                            
                                JIRA Rest API Login using C#
                            
                                Check if a table is empty with Entity Framework using CodeFirst
                            
                                The located assembly's manifest definition does not match the assembly reference. (Exception from HRESULT: 0x80131040)
                            
                                How to search through all items of a combobox in C#? [closed]
                            
                                Dragging custom window title bar from top when maximized does not work
                            
                                Assembly Not Referenced compilation error in foreach loop in Razor view
                            
                                Programmatically Open Word Document Located in the Computer in C#
                            
                                Create scope factory in asp.net core
                            
                                Dynamically Create a generic type for template
                            
                                detect os language from c#
                            
                                string.Format() parameters
                            
                                Inheritable only inside assembly in C#
                            
                                C# DataGridView Check if empty
                            
                                How do I convert a C# class to an XMLElement or XMLDocument
                            
                                tessnet2 fails to load
                            
                                Find all child controls of specific type using Enumerable.OfType<T>() or LINQ
                            
                                Python equivalent of C#'s .Select?
                            
                                Change XmlElement name for XML serialisation
                            
                                The call stack does not say "where you came from", but "where you are going next"?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Removing control characters from a UTF-8 string

Tags:

string

c#

utf-8

control-characters

Xaqron

People also ask

2 Answers

BritishDeveloper

Centro

Recent Activity

Donate For Us