I need to get all substrings matching a regex, I know I can probably build an automaton for it, but I am looking for a simpler solution.
the problem is, Matcher.find() doesn't return all results.
String str = "abaca";
Matcher matcher = Pattern.compile("a.a").matcher(str);
while (matcher.find()) {
System.out.println(str.substring(matcher.start(),matcher.end()));
}
The result is aba
and not aba,aca
as I want...
any ideas?
EDIT:
another example: for string=abaa, regex=a.*a I am expecting to get aba,abaa,aa
p.s. if it cannot be achieved using regular expressions, it's also an answer, I just want to know I'm not re-inventing the wheel for something the language already provides me with...
$ means "Match the end of the string" (the position after the last character in the string).
You could do something like this:
import java.util.*;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Main {
public static List<String> getAllMatches(String text, String regex) {
List<String> matches = new ArrayList<String>();
Matcher m = Pattern.compile("(?=(" + regex + "))").matcher(text);
while(m.find()) {
matches.add(m.group(1));
}
return matches;
}
public static void main(String[] args) {
System.out.println(getAllMatches("abaca", "a.a"));
System.out.println(getAllMatches("abaa", "a.*a"));
}
}
which prints:
[aba, aca]
[abaa, aa]
The only thing is that you're missing aba
from the last matches-list. This is because of the greedy .*
in a.*a
. You can't fix this with regex. You could do this by iterating over all possible substrings and call .matches(regex)
on each substring:
public static List<String> getAllMatches(String text, String regex) {
List<String> matches = new ArrayList<String>();
for(int length = 1; length <= text.length(); length++) {
for(int index = 0; index <= text.length()-length; index++) {
String sub = text.substring(index, index + length);
if(sub.matches(regex)) {
matches.add(sub);
}
}
}
return matches;
}
If your text will stay relatively small, this will work, but for larger strings, this may become too computationally intense.
By default new match starts at the end of the previous one. If youe matches can overlap, you need to specify start point manually:
int start = 0;
while (matcher.find(start)) {
...
start = matcher.start() + 1;
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With