Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Clean the string? is there any better way of doing it?

I am using this method to clean the string

public static string CleanString(string dirtyString) {     string removeChars = " ?&^$#@!()+-,:;<>’\'-_*";     string result = dirtyString;      foreach (char c in removeChars)     {         result = result.Replace(c.ToString(), string.Empty);     }      return result; } 

This method works fine.. BUT there is a performance glitch in this method. everytime i pass the string, every character goes in loop, if i have a large string then it would take too much time to return the object.

Is there any other better way of doing the same thing?. like in LINQ or JQUERY / Javascript

Any suggestion would be appreciated.

like image 865
patel.milanb Avatar asked Jul 09 '12 13:07

patel.milanb


People also ask

How do you clear a string in C#?

C# | Remove() Method In C#, Remove() method is a String Method. It is used for removing all the characters from the specified position of a string.


2 Answers

OK, consider the following test:

public class CleanString {     //by MSDN http://msdn.microsoft.com/en-us/library/844skk0h(v=vs.71).aspx     public static string UseRegex(string strIn)     {         // Replace invalid characters with empty strings.         return Regex.Replace(strIn, @"[^\w\.@-]", "");     }      // by Paolo Tedesco     public static String UseStringBuilder(string strIn)     {         const string removeChars = " ?&^$#@!()+-,:;<>’\'-_*";         // specify capacity of StringBuilder to avoid resizing         StringBuilder sb = new StringBuilder(strIn.Length);         foreach (char x in strIn.Where(c => !removeChars.Contains(c)))         {             sb.Append(x);         }         return sb.ToString();     }      // by Paolo Tedesco, but using a HashSet     public static String UseStringBuilderWithHashSet(string strIn)     {         var hashSet = new HashSet<char>(" ?&^$#@!()+-,:;<>’\'-_*");         // specify capacity of StringBuilder to avoid resizing         StringBuilder sb = new StringBuilder(strIn.Length);         foreach (char x in strIn.Where(c => !hashSet.Contains(c)))         {             sb.Append(x);         }         return sb.ToString();     }      // by SteveDog     public static string UseStringBuilderWithHashSet2(string dirtyString)     {         HashSet<char> removeChars = new HashSet<char>(" ?&^$#@!()+-,:;<>’\'-_*");         StringBuilder result = new StringBuilder(dirtyString.Length);         foreach (char c in dirtyString)             if (removeChars.Contains(c))                 result.Append(c);         return result.ToString();     }      // original by patel.milanb     public static string UseReplace(string dirtyString)     {         string removeChars = " ?&^$#@!()+-,:;<>’\'-_*";         string result = dirtyString;          foreach (char c in removeChars)         {             result = result.Replace(c.ToString(), string.Empty);         }          return result;     }      // by L.B     public static string UseWhere(string dirtyString)     {         return new String(dirtyString.Where(Char.IsLetterOrDigit).ToArray());     } }  static class Program {     /// <summary>     /// The main entry point for the application.     /// </summary>     [STAThread]     static void Main()     {         var dirtyString = "sdfdf.dsf8908()=(=(sadfJJLef@ssyd€sdöf////fj()=/§(§&/(\"&sdfdf.dsf8908()=(=(sadfJJLef@ssyd€sdöf////fj()=/§(§&/(\"&sdfdf.dsf8908()=(=(sadfJJLef@ssyd€sdöf";         var sw = new Stopwatch();          var iterations = 50000;          sw.Start();         for (var i = 0; i < iterations; i++)             CleanString.<SomeMethod>(dirtyString);         sw.Stop();         Debug.WriteLine("CleanString.<SomeMethod>: " + sw.ElapsedMilliseconds.ToString());         sw.Reset();          ....         <repeat>         ....            } } 

Output

CleanString.UseReplace: 791 CleanString.UseStringBuilder: 2805 CleanString.UseStringBuilderWithHashSet: 521 CleanString.UseStringBuilderWithHashSet2: 331 CleanString.UseRegex: 1700 CleanString.UseWhere: 233 

Conclusion

Does probably not matter which method you use.

The difference in time between the fasted (UseWhere: 233ms) and the slowest (UseStringBuilder: 2805ms) method is 2572ms when called 50000(!) times in a row. You should probably not need to care about it if don't run the method that often.

But if you do, use the UseWhere method (written by L.B); but also note that it is slightly different.

like image 150
sloth Avatar answered Sep 20 '22 23:09

sloth


If it's purely speed and efficiency you are after, I would recommend doing something like this:

public static string CleanString(string dirtyString) {     HashSet<char> removeChars = new HashSet<char>(" ?&^$#@!()+-,:;<>’\'-_*");     StringBuilder result = new StringBuilder(dirtyString.Length);     foreach (char c in dirtyString)         if (!removeChars.Contains(c)) // prevent dirty chars             result.Append(c);     return result.ToString(); } 

RegEx is certainly an elegant solution, but it adds extra overhead. By specifying the starting length of the string builder, it will only need to allocate the memory once (and a second time for the ToString at the end). This will cut down on memory usage and increase the speed, especially on longer strings.

However, as L.B. said, if you are using this to properly encode text that is bound for HTML output, you should be using HttpUtility.HtmlEncode instead of doing it yourself.

like image 29
Steven Doggart Avatar answered Sep 22 '22 23:09

Steven Doggart