Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I use Regex against a TextReader?

Tags:

.net

regex

What's the best way to look for a pattern in a (potentially) very large text.

I could use Regex but it accepts a string as an argument. Is there a way to use it with a TextReader or some kind of stream instead?

like image 611
user261258 Avatar asked Feb 27 '23 16:02

user261258


2 Answers

No, a regular expression may need to do backtracking. As a stream only is read forward it would mean that it had to keep the entire stream in memory anyway. Even if you have a regular expression that wouldn't backtrack, the engine isn't built for this.

Besides, regular expressions isn't very fast anyway. You should look for a pattern matching method that is designed for reading streams.

like image 157
Guffa Avatar answered May 10 '23 10:05

Guffa


Since your patterns are relatively simple (as indicated in your edit), you should be able to use regular expressions and just read the stream line-by-line. Here is an example that finds words. (Maybe, depending on how you are defining "words." :-) )

var pattern = new Regex(@"\b\w+\b");

using (var reader = new StreamReader(@"..\..\TextFile1.txt"))
{
    while (reader.Peek() >= 0)
    {
        Match match = pattern.Match(reader.ReadLine());
        while (match.Success)
        {
            Console.WriteLine(match.Value);
            match = match.NextMatch();
        }
    }
}

If you are looking for something that involves newlines, then you will have to be a little creative. Add them to the base string being searched. Or, if multiple newlines are significant, build the search string in memory with multiple ReadLine() calls until a non-newline is found. Then process that and move on in the stream.

like image 27
Dave Mateer Avatar answered May 10 '23 09:05

Dave Mateer