I'm writing the contents of a text file to a StringBuilder and I then want to perform a number of find/replace actions on the text contained in the StringBuilder using regular expressions.
I've run into a problem as the StringBuilder replace function is not capable of accepting regular expression arguments.
I could use Regex.Replace on a normal string but I'm under the impression that this is inefficient due to the fact that two copies of the string will need to be created in memory as .net strings are immutable.
Once I've updated the text I plan to write it back to the original file.
What's the best and most efficient way to solve my problem?
EDIT
In addition to the answer(s) below, I've found the following questions that also shed some light on my problem -
The replace(int start, int end, String str) method of StringBuilder class is used to replace the characters in a substring of this sequence with characters in the specified String.
When you want to search and replace specific patterns of text, use regular expressions. They can help you in pattern matching, parsing, filtering of results, and so on. Once you learn the regex syntax, you can use it for almost any language. Press Ctrl+R to open the search and replace pane.
To replace a string in Python, the regex sub() method is used. It is a built-in Python method in re module that returns replaced string. Don't forget to import the re module. This method searches the pattern in the string and then replace it with a new given expression.
The best and most efficient solution for your time is to try the simplest approach first: forget the StringBuilder
and just use Regex.Replace
. Then find out how slow it is - it may very well be good enough. Don't forget to try the regex in both compiled and non-compiled mode.
If that isn't fast enough, consider using a StringBuilder
for any replacements you can express simply, and then use Regex.Replace
for the rest. You might also want to consider trying to combine replacements, reducing the number of regexes (and thus intermediate strings) used.
You have 3 options:
Do this in an inefficient way with strings as others have recommended here.
Use the .Matches()
call on your Regex
object, and emulate the way .Replace()
works (see #3).
Adapt the Mono implementation of Regex
to build a Regex
that accepts StringBuilder
. Almost all of the work is already done for you in Mono, but it will take time to suss out the parts that make it work into their own library. Mono's Regex
leverages Novell's 2002 JVM implementation of Regex
, oddly enough.
Expanding on the above:
You can mimic LTRReplace
's behavior by calling .Matches()
, tracking where you are in the original string, and looping:
var matches = regex.Matches(original);
var sb = new StringBuilder(original.Length);
int pos = 0; // position in original string
foreach(var match in matches)
{
// Append the portion of the original we skipped
sb.Append(original.Substring(pos, match.Index));
pos = match.Index;
// Make any operations you like on the match result, like your own custom Replace, or even run another Regex
pos += match.Value.Length;
}
sb.Append(original.Substring(pos, original.Length - 1));
But, this only saves you some strings - the Mono approach is the only one that really eliminates strings outright.
This answer has been sitting out since 2014, and I never saw a StringBuilder based Regex land either here in the comments or in searching. So, just to get the ball rolling I extracted the Regex impl from Mono and put it here:
https://github.com/brass9/RegexStringBuilder
I then created an interface IString
to allow the inputs and outputs to be more loosely passed - with string
, StringBuilder
and char[]
each wrapped in a class that implements IString.
The result is not fast - Microsoft's highly optimized code runs 10,000 simple replaces ~6x faster than this code. But, I've done nothing to optimize it, especially around eliminating strings deeper in the underlying code (it casts to string in some cases to run .ToLower() only to go back to char arrays).
Contributions welcome. A discussion of how the code worked in Mono from 2014 (shortly before it was removed from Mono, for Microsoft's string-based implementation) is below:
System.Text.RegularExpressions.Regex
uses an RxCompiler
to instantiate an IMachineFactory in the form of an RxInterpreterFactory
, which unsurprisingly makes IMachine
s as RxInterpreter
s. Getting those to emit is most of what you need to do, although if you're just looking to learn how it's all structured for efficiency, it's notable much of what you're looking for is in its base class, BaseMachine
.
In particular, in BaseMachine
is the StringBuilder
-based stuff. In the method LTRReplace
, it first instantiates a StringBuilder with the initial string, and everything from there on out is purely StringBuilder-based. It's actually very annoying that Regex doesn't have StringBuilder methods hanging out, if we assume the internal Microsoft .Net implementation is similar.
I'm not sure if this helps your scenario or not, but I ran into some memory consumption ceilings with Regex and I needed a simple wildcard replacement extension method on a StringBuilder to push past it. If you need complex Regex matching and/or backreferences, this won't do, but if simple * or ? wildcard replacements (with literal "replace" text) would get the job done for you, then the workaround at the end of my question here should at least give you a boost:
Has anyone implemented a Regex and/or Xml parser around StringBuilders or Streams?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With