What's the best way to look for a pattern in a (potentially) very large text.
I could use Regex but it accepts a string as an argument. Is there a way to use it with a TextReader or some kind of stream instead?
No, a regular expression may need to do backtracking. As a stream only is read forward it would mean that it had to keep the entire stream in memory anyway. Even if you have a regular expression that wouldn't backtrack, the engine isn't built for this.
Besides, regular expressions isn't very fast anyway. You should look for a pattern matching method that is designed for reading streams.
Since your patterns are relatively simple (as indicated in your edit), you should be able to use regular expressions and just read the stream line-by-line. Here is an example that finds words. (Maybe, depending on how you are defining "words." :-) )
var pattern = new Regex(@"\b\w+\b");
using (var reader = new StreamReader(@"..\..\TextFile1.txt"))
{
while (reader.Peek() >= 0)
{
Match match = pattern.Match(reader.ReadLine());
while (match.Success)
{
Console.WriteLine(match.Value);
match = match.NextMatch();
}
}
}
If you are looking for something that involves newlines, then you will have to be a little creative. Add them to the base string being searched. Or, if multiple newlines are significant, build the search string in memory with multiple ReadLine()
calls until a non-newline is found. Then process that and move on in the stream.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With