Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Performing regex on a stream

Tags:

java

regex

I have some large text files which im going to preform consecutive matching on (just capturing, not replacing). Im thinking its not such a good idea to keep the whole file in memory, but rather use a Reader.

What i know about the input is that if there's a match, its not going to span more than 5 lines. So my idea was to have some sort of buffer which just keeps these 5 lines, or so, do the first search, and continue. But it has to "know" where the regex match ended for this to work. e.g if the match ends at line 2 it should start the next search from here. Is it possible to do something like this in an efficient way?

like image 646
gwohpq9 Avatar asked Jun 10 '10 10:06

gwohpq9


1 Answers

You could use a Scanner and the findWithinHorizon method:

Scanner s = new Scanner(new File("thefile"));
String nextMatch = s.findWithinHorizon(yourPattern, 0);

From the api on findWithinHorizon:

If horizon is 0, then the horizon is ignored and this method continues to search through the input looking for the specified pattern without bound. In this case it may buffer all of the input searching for the pattern.

A side note: When matching on multiple lines, you might want to look at the constants Pattern.MULTILINE and Pattern.DOTALL.

like image 144
aioobe Avatar answered Sep 30 '22 00:09

aioobe