I'm looking for the fastest way to replace multiple (~500) substrings of a big (~1mb) string. Whatever I have tried it seems that String.Replace is the fastest way of doing it.
I just care about the fastest possible way. Not code readability, maintainability etc. I don't care if I need to use unsafe code or pre-process the original string either.
Each replace iteration will replace ABC on the string with some other string (different per replace iteration). The string to replace will ALWAYS be the same - ABC will always be ABC. Never ABD. So if there are 400.000
thousands replace iterations. The same string - ABC - will be replaced with some other (different) string each time.
I can be in control of what ABC is. I can make it super-short or super-long as long as it doesn't affect the results. Clearly ABC can't be hello cause hello will exist as a word in most of the input strings.
Example input: ABCDABCABCDABCABCDABCABCDABCD
Example replace from string: BC
Example replace with strings: AA, BB, CC, DD, EE (5 iterations)
Example outputs:
AAADAAAAAADAAAAAADAAAAAADAAAD ABBDABBABBDABBABBDABBABBDABBD ACCDACCACCDACCACCDACCACCDACCD ADDDADDADDDADDADDDADDADDDADDD AEEDAEEAEEDAEEAEEDAEEAEEDAEED
Average case: Input string is 100-200kb with 40.000 replace iterations. Worst case: Input string is 1-2mb with 400.000 replace iterations.
I can do ANYTHING. Do it in parallel, do it unsafe, etc. It doesn't matter how I do it. What matters is that it needs to be as fast as it gets.
Show activity on this post. var str = "I have a cat, a dog, and a goat."; str = str. replace(/goat/i, "cat"); // now str = "I have a cat, a dog, and a cat." str = str. replace(/dog/i, "goat"); // now str = "I have a cat, a goat, and a cat." str = str.
Using unsafe
and compiled as x64
result:
Implementation | Exec | GC #1 Simple | 4706ms | 0ms #2 Simple parallel | 2265ms | 0ms #3 ParallelSubstring | 800ms | 21ms #4 Fredou unsafe | 432ms | 15ms
take the code of Erti-Chris Eelmaa
and replace my previous one with this.
I don't think I will do another iteration but i did learn a few thing with unsafe which is a good thing :-)
private unsafe static void FredouImplementation(string input, int inputLength, string replace, string[] replaceBy) { var indexes = new List<int>(); //input = "ABCDABCABCDABCABCDABCABCDABCD"; //inputLength = input.Length; //replaceBy = new string[] { "AA", "BB", "CC", "DD", "EE" }; //my own string.indexof to save a few ms int len = inputLength; fixed (char* i = input, r = replace) { int replaceValAsInt = *((int*)r); while (--len > -1) { if (replaceValAsInt == *((int*)&i[len])) { indexes.Add(len--); } } } var idx = indexes.ToArray(); len = indexes.Count; Parallel.For(0, replaceBy.Length, l => Process(input, inputLength, replaceBy[l], idx, len) ); } private unsafe static void Process(string input, int len, string replaceBy, int[] idx, int idxLen) { var output = new char[len]; fixed (char* o = output, i = input, r = replaceBy) { int replaceByValAsInt = *((int*)r); //direct copy, simulate string.copy while (--len > -1) { o[len] = i[len]; } while (--idxLen > -1) { ((int*)&o[idx[idxLen]])[0] = replaceByValAsInt; } } //Console.WriteLine(output); }
I had a similar issue on a project and I've implemented a Regex solution to perform multiple and case insensitive replacements on a file.
For efficiency purposes, I set criteria to go through the original string only once.
I've published a simple console app to test some strategies on https://github.com/nmcc/Spikes/tree/master/StringMultipleReplacements
The code for the Regex solution is similar to this:
Dictionary<string, string> replacements = new Dictionary<string, string>(StringComparer.OrdinalIgnoreCase); // Fill the dictionary with the proper replacements: StringBuilder patternBuilder = new StringBuilder(); patternBuilder.Append('('); bool firstReplacement = true; foreach (var replacement in replacements.Keys) { if (!firstReplacement) patternBuilder.Append('|'); else firstReplacement = false; patternBuilder.Append('('); patternBuilder.Append(Regex.Escape(replacement)); patternBuilder.Append(')'); } patternBuilder.Append(')'); var regex = new Regex(patternBuilder.ToString(), RegexOptions.IgnoreCase); return regex.Replace(sourceContent, new MatchEvaluator(match => replacements[match.Groups[1].Value]));
EDIT: The execution times running the test application on my computer are:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With