Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cannot remove a set of chars in a string

Tags:

c#

regex

I have a set of characters I want to remove from a string : "/\[]:|<>+=;,?*'@

I'm trying with :

private const string CHARS_TO_REPLACE = @"""/\[]:|<>+=;,?*'@";

private string Clean(string stringToClean)
{
    return Regex.Replace(stringToClean, "[" + Regex.Escape(CHARS_TO_REPLACE) + "]", "");
}

However, the result is strictly identical to the input with something like "Foo, bar and other".

What is wrong in my code ?

This looks like a lot to this question, but with a black list instead of a white list of chars, so I removed the not in ^ char.

like image 459
Steve B Avatar asked Jun 09 '26 06:06

Steve B


2 Answers

You didn't escape the closing square bracket in CHARS_TO_REPLACE

like image 122
bluevector Avatar answered Jun 11 '26 19:06

bluevector


The problem is a misunderstanding of how Regex.Escape works. From MSDN:

Escapes a minimal set of characters (\, *, +, ?, |, {, [, (,), ^, $,., #, and white space) by replacing them with their escape codes.

It works as expected, but you need to think of Regex.Escape as escaping metacharacters outside of a character class. When you use a character class, the things you want to escape inside are different. For example, inside a character class - should be escaped to be literal, otherwise it could act as a range of characters (e.g., [A-Z]).

In your case, as others have mentioned, the ] was not escaped. For any character that holds a special meaning within the character class, you will need to handle them separately after calling Regex.Escape. This should do what you need:

string CHARS_TO_REPLACE = @"""/\[]:|<>+=;,?*'@";
string pattern = "[" + Regex.Escape(CHARS_TO_REPLACE).Replace("]", @"\]") + "]";

string input = "hi\" there\\ [i love regex];@";
string result = Regex.Replace(input, pattern, "");
Console.WriteLine(result);

Otherwise, you were ending up with ["/\\\[]:\|<>\+=;,\?\*'@], which doesn't have ] escaped, so it was really ["/\\\[] as a character class, then :\|<>\+=;,\?\*'@] as the rest of the pattern, which wouldn't match unless your string matched exactly those remaining characters.

like image 25
Ahmad Mageed Avatar answered Jun 11 '26 18:06

Ahmad Mageed



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!