I am using this method to clean the string
public static string CleanString(string dirtyString) { string removeChars = " ?&^$#@!()+-,:;<>’\'-_*"; string result = dirtyString; foreach (char c in removeChars) { result = result.Replace(c.ToString(), string.Empty); } return result; }
This method works fine.. BUT there is a performance glitch in this method. everytime i pass the string, every character goes in loop, if i have a large string then it would take too much time to return the object.
Is there any other better way of doing the same thing?. like in LINQ or JQUERY / Javascript
Any suggestion would be appreciated.
C# | Remove() Method In C#, Remove() method is a String Method. It is used for removing all the characters from the specified position of a string.
OK, consider the following test:
public class CleanString { //by MSDN http://msdn.microsoft.com/en-us/library/844skk0h(v=vs.71).aspx public static string UseRegex(string strIn) { // Replace invalid characters with empty strings. return Regex.Replace(strIn, @"[^\w\.@-]", ""); } // by Paolo Tedesco public static String UseStringBuilder(string strIn) { const string removeChars = " ?&^$#@!()+-,:;<>’\'-_*"; // specify capacity of StringBuilder to avoid resizing StringBuilder sb = new StringBuilder(strIn.Length); foreach (char x in strIn.Where(c => !removeChars.Contains(c))) { sb.Append(x); } return sb.ToString(); } // by Paolo Tedesco, but using a HashSet public static String UseStringBuilderWithHashSet(string strIn) { var hashSet = new HashSet<char>(" ?&^$#@!()+-,:;<>’\'-_*"); // specify capacity of StringBuilder to avoid resizing StringBuilder sb = new StringBuilder(strIn.Length); foreach (char x in strIn.Where(c => !hashSet.Contains(c))) { sb.Append(x); } return sb.ToString(); } // by SteveDog public static string UseStringBuilderWithHashSet2(string dirtyString) { HashSet<char> removeChars = new HashSet<char>(" ?&^$#@!()+-,:;<>’\'-_*"); StringBuilder result = new StringBuilder(dirtyString.Length); foreach (char c in dirtyString) if (removeChars.Contains(c)) result.Append(c); return result.ToString(); } // original by patel.milanb public static string UseReplace(string dirtyString) { string removeChars = " ?&^$#@!()+-,:;<>’\'-_*"; string result = dirtyString; foreach (char c in removeChars) { result = result.Replace(c.ToString(), string.Empty); } return result; } // by L.B public static string UseWhere(string dirtyString) { return new String(dirtyString.Where(Char.IsLetterOrDigit).ToArray()); } } static class Program { /// <summary> /// The main entry point for the application. /// </summary> [STAThread] static void Main() { var dirtyString = "sdfdf.dsf8908()=(=(sadfJJLef@ssyd€sdöf////fj()=/§(§&/(\"&sdfdf.dsf8908()=(=(sadfJJLef@ssyd€sdöf////fj()=/§(§&/(\"&sdfdf.dsf8908()=(=(sadfJJLef@ssyd€sdöf"; var sw = new Stopwatch(); var iterations = 50000; sw.Start(); for (var i = 0; i < iterations; i++) CleanString.<SomeMethod>(dirtyString); sw.Stop(); Debug.WriteLine("CleanString.<SomeMethod>: " + sw.ElapsedMilliseconds.ToString()); sw.Reset(); .... <repeat> .... } }
Output
CleanString.UseReplace: 791 CleanString.UseStringBuilder: 2805 CleanString.UseStringBuilderWithHashSet: 521 CleanString.UseStringBuilderWithHashSet2: 331 CleanString.UseRegex: 1700 CleanString.UseWhere: 233
Conclusion
Does probably not matter which method you use.
The difference in time between the fasted (UseWhere
: 233ms) and the slowest (UseStringBuilder
: 2805ms) method is 2572ms when called 50000(!) times in a row. You should probably not need to care about it if don't run the method that often.
But if you do, use the UseWhere
method (written by L.B); but also note that it is slightly different.
If it's purely speed and efficiency you are after, I would recommend doing something like this:
public static string CleanString(string dirtyString) { HashSet<char> removeChars = new HashSet<char>(" ?&^$#@!()+-,:;<>’\'-_*"); StringBuilder result = new StringBuilder(dirtyString.Length); foreach (char c in dirtyString) if (!removeChars.Contains(c)) // prevent dirty chars result.Append(c); return result.ToString(); }
RegEx is certainly an elegant solution, but it adds extra overhead. By specifying the starting length of the string builder, it will only need to allocate the memory once (and a second time for the ToString
at the end). This will cut down on memory usage and increase the speed, especially on longer strings.
However, as L.B. said, if you are using this to properly encode text that is bound for HTML output, you should be using HttpUtility.HtmlEncode
instead of doing it yourself.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With