I am importing some number of records with multiple string
fields from an old db to a new db. It seems to be very slow and I suspect it's because I do this:
foreach (var oldObj in oldDB) { NewObject newObj = new NewObject(); newObj.Name = oldObj.Name.Trim().Replace('^', 'Č').Replace('@', 'Ž').Replace('[', 'Š') .Replace(']', 'Ć').Replace('`', 'ž').Replace('}', 'ć') .Replace('~', 'č').Replace('{', 'š').Replace('\\', 'Đ'); newObj.Surname = oldObj.Surname.Trim().Replace('^', 'Č').Replace('@', 'Ž').Replace('[', 'Š') .Replace(']', 'Ć').Replace('`', 'ž').Replace('}', 'ć') .Replace('~', 'č').Replace('{', 'š').Replace('\\', 'Đ'); newObj.Address = oldObj.Address.Trim().Replace('^', 'Č').Replace('@', 'Ž').Replace('[', 'Š') .Replace(']', 'Ć').Replace('`', 'ž').Replace('}', 'ć') .Replace('~', 'č').Replace('{', 'š').Replace('\\', 'Đ'); newObj.Note = oldObj.Note.Trim().Replace('^', 'Č').Replace('@', 'Ž').Replace('[', 'Š') .Replace(']', 'Ć').Replace('`', 'ž').Replace('}', 'ć') .Replace('~', 'č').Replace('{', 'š').Replace('\\', 'Đ'); /* ... some processing ... */ }
Now, I have read some posts and articles through the Net where I have seen many different thoughts about this. Some say it's better if I'd do regex with MatchEvaluator
, some say it's the best to leave it as is.
While it's possible that it'd be easier for me to just do a benchmark case for myself, I decided to ask a question here in case someone else has been wondering about the same question, or in case someone knows in advance.
So what is the fastest way to do this in C#?
EDIT
I have posted the benchmark here. At the first sight it looks like Richard's way might be the fastest. However, his way, nor Marc's, would do anything because of the wrong Regex pattern. After correcting the pattern from
@"\^@\[\]`\}~\{\\"
to
@"\^|@|\[|\]|`|\}|~|\{|\\"
it appears as if the old way with chained .Replace() calls is the fastest after all
Use the replace() method to replace multiple characters in a string, e.g. str. replace(/[. _-]/g, ' ') . The first parameter the method takes is a regular expression that can match multiple characters.
Use the translate() method to replace multiple different characters. You can create the translation table specified in translate() by the str. maketrans() . Specify a dictionary whose key is the old character and whose value is the new string in the str.
If you wanted to replace the words with blank string, go with REGEXP_REPLACE() . If you want to replace the words with other words, for example replacing & with and then use replace() . If there are multiple words to be replaced, use multiple nested replace() .
Thanks for your inputs guys. I wrote a quick and dirty benchmark to test your inputs. I have tested parsing 4 strings with 500.000 iterations and have done 4 passes. The result is as follows:
*** Pass 1 Old (Chained String.Replace()) way completed in 814 ms logicnp (ToCharArray) way completed in 916 ms oleksii (StringBuilder) way completed in 943 ms André Christoffer Andersen (Lambda w/ Aggregate) way completed in 2551 ms Richard (Regex w/ MatchEvaluator) way completed in 215 ms Marc Gravell (Static Regex) way completed in 1008 ms *** Pass 2 Old (Chained String.Replace()) way completed in 786 ms logicnp (ToCharArray) way completed in 920 ms oleksii (StringBuilder) way completed in 905 ms André Christoffer Andersen (Lambda w/ Aggregate) way completed in 2515 ms Richard (Regex w/ MatchEvaluator) way completed in 217 ms Marc Gravell (Static Regex) way completed in 1025 ms *** Pass 3 Old (Chained String.Replace()) way completed in 775 ms logicnp (ToCharArray) way completed in 903 ms oleksii (StringBuilder) way completed in 931 ms André Christoffer Andersen (Lambda w/ Aggregate) way completed in 2529 ms Richard (Regex w/ MatchEvaluator) way completed in 214 ms Marc Gravell (Static Regex) way completed in 1022 ms *** Pass 4 Old (Chained String.Replace()) way completed in 799 ms logicnp (ToCharArray) way completed in 908 ms oleksii (StringBuilder) way completed in 938 ms André Christoffer Andersen (Lambda w/ Aggregate) way completed in 2592 ms Richard (Regex w/ MatchEvaluator) way completed in 225 ms Marc Gravell (Static Regex) way completed in 1050 ms
The code for this benchmark is below. Please review the code and confirm that @Richard has got the fastest way. Note that I haven't checked if outputs were correct, I assumed they were.
using System; using System.Collections.Generic; using System.Linq; using System.Text; using System.Diagnostics; using System.Text.RegularExpressions; namespace StringReplaceTest { class Program { static string test1 = "A^@[BCD"; static string test2 = "E]FGH\\"; static string test3 = "ijk`l}m"; static string test4 = "nopq~{r"; static readonly Dictionary<char, string> repl = new Dictionary<char, string> { {'^', "Č"}, {'@', "Ž"}, {'[', "Š"}, {']', "Ć"}, {'`', "ž"}, {'}', "ć"}, {'~', "č"}, {'{', "š"}, {'\\', "Đ"} }; static readonly Regex replaceRegex; static Program() // static initializer { StringBuilder pattern = new StringBuilder().Append('['); foreach (var key in repl.Keys) pattern.Append(Regex.Escape(key.ToString())); pattern.Append(']'); replaceRegex = new Regex(pattern.ToString(), RegexOptions.Compiled); } public static string Sanitize(string input) { return replaceRegex.Replace(input, match => { return repl[match.Value[0]]; }); } static string DoGeneralReplace(string input) { var sb = new StringBuilder(input); return sb.Replace('^', 'Č').Replace('@', 'Ž').Replace('[', 'Š').Replace(']', 'Ć').Replace('`', 'ž').Replace('}', 'ć').Replace('~', 'č').Replace('{', 'š').Replace('\\', 'Đ').ToString(); } //Method for replacing chars with a mapping static string Replace(string input, IDictionary<char, char> replacementMap) { return replacementMap.Keys .Aggregate(input, (current, oldChar) => current.Replace(oldChar, replacementMap[oldChar])); } static void Main(string[] args) { for (int i = 1; i < 5; i++) DoIt(i); } static void DoIt(int n) { Stopwatch sw = new Stopwatch(); int idx = 0; Console.WriteLine("*** Pass " + n.ToString()); // old way sw.Start(); for (idx = 0; idx < 500000; idx++) { string result1 = test1.Replace('^', 'Č').Replace('@', 'Ž').Replace('[', 'Š').Replace(']', 'Ć').Replace('`', 'ž').Replace('}', 'ć').Replace('~', 'č').Replace('{', 'š').Replace('\\', 'Đ'); string result2 = test2.Replace('^', 'Č').Replace('@', 'Ž').Replace('[', 'Š').Replace(']', 'Ć').Replace('`', 'ž').Replace('}', 'ć').Replace('~', 'č').Replace('{', 'š').Replace('\\', 'Đ'); string result3 = test3.Replace('^', 'Č').Replace('@', 'Ž').Replace('[', 'Š').Replace(']', 'Ć').Replace('`', 'ž').Replace('}', 'ć').Replace('~', 'č').Replace('{', 'š').Replace('\\', 'Đ'); string result4 = test4.Replace('^', 'Č').Replace('@', 'Ž').Replace('[', 'Š').Replace(']', 'Ć').Replace('`', 'ž').Replace('}', 'ć').Replace('~', 'č').Replace('{', 'š').Replace('\\', 'Đ'); } sw.Stop(); Console.WriteLine("Old (Chained String.Replace()) way completed in " + sw.ElapsedMilliseconds.ToString() + " ms"); Dictionary<char, char> replacements = new Dictionary<char, char>(); replacements.Add('^', 'Č'); replacements.Add('@', 'Ž'); replacements.Add('[', 'Š'); replacements.Add(']', 'Ć'); replacements.Add('`', 'ž'); replacements.Add('}', 'ć'); replacements.Add('~', 'č'); replacements.Add('{', 'š'); replacements.Add('\\', 'Đ'); // logicnp way sw.Reset(); sw.Start(); for (idx = 0; idx < 500000; idx++) { char[] charArray1 = test1.ToCharArray(); for (int i = 0; i < charArray1.Length; i++) { char newChar; if (replacements.TryGetValue(test1[i], out newChar)) charArray1[i] = newChar; } string result1 = new string(charArray1); char[] charArray2 = test2.ToCharArray(); for (int i = 0; i < charArray2.Length; i++) { char newChar; if (replacements.TryGetValue(test2[i], out newChar)) charArray2[i] = newChar; } string result2 = new string(charArray2); char[] charArray3 = test3.ToCharArray(); for (int i = 0; i < charArray3.Length; i++) { char newChar; if (replacements.TryGetValue(test3[i], out newChar)) charArray3[i] = newChar; } string result3 = new string(charArray3); char[] charArray4 = test4.ToCharArray(); for (int i = 0; i < charArray4.Length; i++) { char newChar; if (replacements.TryGetValue(test4[i], out newChar)) charArray4[i] = newChar; } string result4 = new string(charArray4); } sw.Stop(); Console.WriteLine("logicnp (ToCharArray) way completed in " + sw.ElapsedMilliseconds.ToString() + " ms"); // oleksii way sw.Reset(); sw.Start(); for (idx = 0; idx < 500000; idx++) { string result1 = DoGeneralReplace(test1); string result2 = DoGeneralReplace(test2); string result3 = DoGeneralReplace(test3); string result4 = DoGeneralReplace(test4); } sw.Stop(); Console.WriteLine("oleksii (StringBuilder) way completed in " + sw.ElapsedMilliseconds.ToString() + " ms"); // André Christoffer Andersen way sw.Reset(); sw.Start(); for (idx = 0; idx < 500000; idx++) { string result1 = Replace(test1, replacements); string result2 = Replace(test2, replacements); string result3 = Replace(test3, replacements); string result4 = Replace(test4, replacements); } sw.Stop(); Console.WriteLine("André Christoffer Andersen (Lambda w/ Aggregate) way completed in " + sw.ElapsedMilliseconds.ToString() + " ms"); // Richard way sw.Reset(); sw.Start(); Regex reg = new Regex(@"\^|@|\[|\]|`|\}|~|\{|\\"); MatchEvaluator eval = match => { switch (match.Value) { case "^": return "Č"; case "@": return "Ž"; case "[": return "Š"; case "]": return "Ć"; case "`": return "ž"; case "}": return "ć"; case "~": return "č"; case "{": return "š"; case "\\": return "Đ"; default: throw new Exception("Unexpected match!"); } }; for (idx = 0; idx < 500000; idx++) { string result1 = reg.Replace(test1, eval); string result2 = reg.Replace(test2, eval); string result3 = reg.Replace(test3, eval); string result4 = reg.Replace(test4, eval); } sw.Stop(); Console.WriteLine("Richard (Regex w/ MatchEvaluator) way completed in " + sw.ElapsedMilliseconds.ToString() + " ms"); // Marc Gravell way sw.Reset(); sw.Start(); for (idx = 0; idx < 500000; idx++) { string result1 = Sanitize(test1); string result2 = Sanitize(test2); string result3 = Sanitize(test3); string result4 = Sanitize(test4); } sw.Stop(); Console.WriteLine("Marc Gravell (Static Regex) way completed in " + sw.ElapsedMilliseconds.ToString() + " ms\n"); } } }
EDIT June 2020
Since this Q&A is still getting hits, I wanted to update it with additional input from user1664043 using StringBuilder w/ IndexOfAny, this time compiled using .NET Core 3.1, and here are the results:
*** Pass 1 Old (Chained String.Replace()) way completed in 199 ms logicnp (ToCharArray) way completed in 296 ms oleksii (StringBuilder) way completed in 416 ms André Christoffer Andersen (Lambda w/ Aggregate) way completed in 870 ms Richard (Regex w/ MatchEvaluator) way completed in 1722 ms Marc Gravell (Static Regex) way completed in 395 ms user1664043 (StringBuilder w/ IndexOfAny) way completed in 459 ms *** Pass 2 Old (Chained String.Replace()) way completed in 215 ms logicnp (ToCharArray) way completed in 239 ms oleksii (StringBuilder) way completed in 341 ms André Christoffer Andersen (Lambda w/ Aggregate) way completed in 758 ms Richard (Regex w/ MatchEvaluator) way completed in 1591 ms Marc Gravell (Static Regex) way completed in 354 ms user1664043 (StringBuilder w/ IndexOfAny) way completed in 426 ms *** Pass 3 Old (Chained String.Replace()) way completed in 199 ms logicnp (ToCharArray) way completed in 265 ms oleksii (StringBuilder) way completed in 337 ms André Christoffer Andersen (Lambda w/ Aggregate) way completed in 817 ms Richard (Regex w/ MatchEvaluator) way completed in 1666 ms Marc Gravell (Static Regex) way completed in 373 ms user1664043 (StringBuilder w/ IndexOfAny) way completed in 412 ms *** Pass 4 Old (Chained String.Replace()) way completed in 199 ms logicnp (ToCharArray) way completed in 230 ms oleksii (StringBuilder) way completed in 324 ms André Christoffer Andersen (Lambda w/ Aggregate) way completed in 791 ms Richard (Regex w/ MatchEvaluator) way completed in 1699 ms Marc Gravell (Static Regex) way completed in 359 ms user1664043 (StringBuilder w/ IndexOfAny) way completed in 413 ms
And the updated code:
using System; using System.Collections.Generic; using System.Diagnostics; using System.Linq; using System.Text; using System.Text.RegularExpressions; namespace Test.StringReplace { class Program { static string test1 = "A^@[BCD"; static string test2 = "E]FGH\\"; static string test3 = "ijk`l}m"; static string test4 = "nopq~{r"; static readonly Dictionary<char, string> repl = new Dictionary<char, string> { {'^', "Č"}, {'@', "Ž"}, {'[', "Š"}, {']', "Ć"}, {'`', "ž"}, {'}', "ć"}, {'~', "č"}, {'{', "š"}, {'\\', "Đ"} }; static readonly Regex replaceRegex; static readonly char[] badChars = new char[] { '^', '@', '[', ']', '`', '}', '~', '{', '\\' }; static readonly char[] replacementChars = new char[] { 'Č', 'Ž', 'Š', 'Ć', 'ž', 'ć', 'č', 'š', 'Đ' }; static Program() // static initializer { StringBuilder pattern = new StringBuilder().Append('['); foreach (var key in repl.Keys) pattern.Append(Regex.Escape(key.ToString())); pattern.Append(']'); replaceRegex = new Regex(pattern.ToString(), RegexOptions.Compiled); } public static string Sanitize(string input) { return replaceRegex.Replace(input, match => { return repl[match.Value[0]]; }); } static string DoGeneralReplace(string input) { var sb = new StringBuilder(input); return sb.Replace('^', 'Č').Replace('@', 'Ž').Replace('[', 'Š').Replace(']', 'Ć').Replace('`', 'ž').Replace('}', 'ć').Replace('~', 'č').Replace('{', 'š').Replace('\\', 'Đ').ToString(); } //Method for replacing chars with a mapping static string Replace(string input, IDictionary<char, char> replacementMap) { return replacementMap.Keys .Aggregate(input, (current, oldChar) => current.Replace(oldChar, replacementMap[oldChar])); } static string ReplaceCharsWithIndexOfAny(string sIn) { int replChar = sIn.IndexOfAny(badChars); if (replChar < 0) return sIn; // Don't even bother making a copy unless you know you have something to swap StringBuilder sb = new StringBuilder(sIn, 0, replChar, sIn.Length + 10); while (replChar >= 0 && replChar < sIn.Length) { var c = replacementChars[replChar]; sb.Append(c); ////// This approach lets you swap a char for a string or to remove some ////// If you had a straight char for char swap, you could just have your repl chars in an array with the same ordinals and do it all in 2 lines matching the ordinals. ////c = c switch ////{ //// ////case "^": //// //// c = "Č"; //// //// ... //// '\ufeff' => null, //// _ => replacementChars[replChar], ////}; ////if (c != null) ////{ //// sb.Append(c); ////} replChar++; // skip over what we just replaced if (replChar < sIn.Length) { int nextRepChar = sIn.IndexOfAny(badChars, replChar); sb.Append(sIn, replChar, (nextRepChar > 0 ? nextRepChar : sIn.Length) - replChar); replChar = nextRepChar; } } return sb.ToString(); } static void Main(string[] args) { for (int i = 1; i < 5; i++) DoIt(i); } static void DoIt(int n) { Stopwatch sw = new Stopwatch(); int idx = 0; Console.WriteLine("*** Pass " + n.ToString()); // old way sw.Start(); for (idx = 0; idx < 500000; idx++) { string result1 = test1.Replace('^', 'Č').Replace('@', 'Ž').Replace('[', 'Š').Replace(']', 'Ć').Replace('`', 'ž').Replace('}', 'ć').Replace('~', 'č').Replace('{', 'š').Replace('\\', 'Đ'); string result2 = test2.Replace('^', 'Č').Replace('@', 'Ž').Replace('[', 'Š').Replace(']', 'Ć').Replace('`', 'ž').Replace('}', 'ć').Replace('~', 'č').Replace('{', 'š').Replace('\\', 'Đ'); string result3 = test3.Replace('^', 'Č').Replace('@', 'Ž').Replace('[', 'Š').Replace(']', 'Ć').Replace('`', 'ž').Replace('}', 'ć').Replace('~', 'č').Replace('{', 'š').Replace('\\', 'Đ'); string result4 = test4.Replace('^', 'Č').Replace('@', 'Ž').Replace('[', 'Š').Replace(']', 'Ć').Replace('`', 'ž').Replace('}', 'ć').Replace('~', 'č').Replace('{', 'š').Replace('\\', 'Đ'); } sw.Stop(); Console.WriteLine("Old (Chained String.Replace()) way completed in " + sw.ElapsedMilliseconds.ToString() + " ms"); Dictionary<char, char> replacements = new Dictionary<char, char>(); replacements.Add('^', 'Č'); replacements.Add('@', 'Ž'); replacements.Add('[', 'Š'); replacements.Add(']', 'Ć'); replacements.Add('`', 'ž'); replacements.Add('}', 'ć'); replacements.Add('~', 'č'); replacements.Add('{', 'š'); replacements.Add('\\', 'Đ'); // logicnp way sw.Reset(); sw.Start(); for (idx = 0; idx < 500000; idx++) { char[] charArray1 = test1.ToCharArray(); for (int i = 0; i < charArray1.Length; i++) { char newChar; if (replacements.TryGetValue(test1[i], out newChar)) charArray1[i] = newChar; } string result1 = new string(charArray1); char[] charArray2 = test2.ToCharArray(); for (int i = 0; i < charArray2.Length; i++) { char newChar; if (replacements.TryGetValue(test2[i], out newChar)) charArray2[i] = newChar; } string result2 = new string(charArray2); char[] charArray3 = test3.ToCharArray(); for (int i = 0; i < charArray3.Length; i++) { char newChar; if (replacements.TryGetValue(test3[i], out newChar)) charArray3[i] = newChar; } string result3 = new string(charArray3); char[] charArray4 = test4.ToCharArray(); for (int i = 0; i < charArray4.Length; i++) { char newChar; if (replacements.TryGetValue(test4[i], out newChar)) charArray4[i] = newChar; } string result4 = new string(charArray4); } sw.Stop(); Console.WriteLine("logicnp (ToCharArray) way completed in " + sw.ElapsedMilliseconds.ToString() + " ms"); // oleksii way sw.Reset(); sw.Start(); for (idx = 0; idx < 500000; idx++) { string result1 = DoGeneralReplace(test1); string result2 = DoGeneralReplace(test2); string result3 = DoGeneralReplace(test3); string result4 = DoGeneralReplace(test4); } sw.Stop(); Console.WriteLine("oleksii (StringBuilder) way completed in " + sw.ElapsedMilliseconds.ToString() + " ms"); // André Christoffer Andersen way sw.Reset(); sw.Start(); for (idx = 0; idx < 500000; idx++) { string result1 = Replace(test1, replacements); string result2 = Replace(test2, replacements); string result3 = Replace(test3, replacements); string result4 = Replace(test4, replacements); } sw.Stop(); Console.WriteLine("André Christoffer Andersen (Lambda w/ Aggregate) way completed in " + sw.ElapsedMilliseconds.ToString() + " ms"); // Richard way sw.Reset(); sw.Start(); Regex reg = new Regex(@"\^|@|\[|\]|`|\}|~|\{|\\"); MatchEvaluator eval = match => { switch (match.Value) { case "^": return "Č"; case "@": return "Ž"; case "[": return "Š"; case "]": return "Ć"; case "`": return "ž"; case "}": return "ć"; case "~": return "č"; case "{": return "š"; case "\\": return "Đ"; default: throw new Exception("Unexpected match!"); } }; for (idx = 0; idx < 500000; idx++) { string result1 = reg.Replace(test1, eval); string result2 = reg.Replace(test2, eval); string result3 = reg.Replace(test3, eval); string result4 = reg.Replace(test4, eval); } sw.Stop(); Console.WriteLine("Richard (Regex w/ MatchEvaluator) way completed in " + sw.ElapsedMilliseconds.ToString() + " ms"); // Marc Gravell way sw.Reset(); sw.Start(); for (idx = 0; idx < 500000; idx++) { string result1 = Sanitize(test1); string result2 = Sanitize(test2); string result3 = Sanitize(test3); string result4 = Sanitize(test4); } sw.Stop(); Console.WriteLine("Marc Gravell (Static Regex) way completed in " + sw.ElapsedMilliseconds.ToString() + " ms"); // user1664043 way sw.Reset(); sw.Start(); for (idx = 0; idx < 500000; idx++) { string result1 = ReplaceCharsWithIndexOfAny(test1); string result2 = ReplaceCharsWithIndexOfAny(test2); string result3 = ReplaceCharsWithIndexOfAny(test3); string result4 = ReplaceCharsWithIndexOfAny(test4); } sw.Stop(); Console.WriteLine("user1664043 (StringBuilder w/ IndexOfAny) way completed in " + sw.ElapsedMilliseconds.ToString() + " ms\n"); } } }
the fastest way
The only way is to compare the performance yourself. Try as in the Q, using StringBuilder
and also Regex.Replace
.
But micro-benchmarks don't consider the scope of the whole system. If this method is only a small fraction of the overall system its performance probably doesn't matter to the overall application's performance.
Some notes:
String
as above (I assume) will create lots of intermediate strings: more work for the GC. But it is simple.StringBuilder
allows the same underlying data to be modified with each replace. This creates less garbage. It is almost as simple as using String
.regex
is most complex (because you need to have code to work out the replacement), but allows a single expression. I would expect this to be slower unless the list of replacements is very large and replacements are rare in the input string (ie. most replace method calls replace nothing, just costing a search through the string).I expect #2 would be slightly quicker over repeated use (thousands of times) due to less GC load.
For the regex approach you need something like:
newObj.Name = Regex.Replace(oldObj.Name.Trim(), @"[@^\[\]`}~{\\]", match => { switch (match.Value) { case "^": return "Č"; case "@": return "Ž"; case "[": return "Š"; case "]": return "Ć"; case "`": return "ž"; case "}": return "ć"; case "~": return "č"; case "{": return "š"; case "\\": return "Đ"; default: throw new Exception("Unexpected match!"); } });
This could be done in a reusable way by parameterising with a Dictionary<char,char>
to hold the replacements and reusable MatchEvaluator
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With