Recently I change version of the JDK 8 instead 7 of my project and now I overwrite some code snippets using new features that came with Java 8.
final Matcher mtr = Pattern.compile(regex).matcher(input);
HashSet<String> set = new HashSet<String>() {{
while (mtr.find()) add(mtr.group().toLowerCase());
}};
How I can write this code using Stream API ?
A Matcher
-based spliterator implementation can be quite simple if you reuse the JDK-provided Spliterators.AbstractSpliterator
:
public class MatcherSpliterator extends AbstractSpliterator<String[]>
{
private final Matcher m;
public MatcherSpliterator(Matcher m) {
super(Long.MAX_VALUE, ORDERED | NONNULL | IMMUTABLE);
this.m = m;
}
@Override public boolean tryAdvance(Consumer<? super String[]> action) {
if (!m.find()) return false;
final String[] groups = new String[m.groupCount()+1];
for (int i = 0; i <= m.groupCount(); i++) groups[i] = m.group(i);
action.accept(groups);
return true;
}
}
Note that the spliterator provides all matcher groups, not just the full match. Also note that this spliterator supports parallelism because AbstractSpliterator
implements a splitting policy.
Typically you will use a convenience stream factory:
public static Stream<String[]> matcherStream(Matcher m) {
return StreamSupport.stream(new MatcherSpliterator(m), false);
}
This gives you a powerful basis to concisely write all kinds of complex regex-oriented logic, for example:
private static final Pattern emailRegex = Pattern.compile("([^,]+?)@([^,]+)");
public static void main(String[] args) {
final String emails = "[email protected], [email protected], [email protected]";
System.out.println("User has e-mail accounts on these domains: " +
matcherStream(emailRegex.matcher(emails))
.map(gs->gs[2])
.collect(joining(", ")));
}
Which prints
User has e-mail accounts on these domains: gmail.com, yahoo.com, tijuana.com
For completeness, your code will be rewritten as
Set<String> set = matcherStream(mtr).map(gs->gs[0].toLowerCase()).collect(toSet());
Marko's answer demonstrates how to get matches into a stream using a Spliterator
. Well done, give that man a big +1! Seriously, make sure you upvote his answer before you even consider upvoting this one, since this one is entirely derivative of his.
I have only a small bit to add to Marko's answer, which is that instead of representing the matches as an array of strings (with each array element representing a match group), the matches are better represented as a MatchResult
which is a type invented for this purpose. Thus the result would be a Stream<MatchResult>
instead of Stream<String[]>
. The code gets a little simpler, too. The tryAdvance
code would be
if (m.find()) {
action.accept(m.toMatchResult());
return true;
} else {
return false;
}
The map
call in his email-matching example would change to
.map(mr -> mr.group(2))
and the OP's example would be rewritten as
Set<String> set = matcherStream(mtr)
.map(mr -> mr.group(0).toLowerCase())
.collect(toSet());
Using MatchResult
gives a bit more flexibility in that it also provides offsets of match groups within the string, which could be useful for certain applications.
I don't think you can turn this into a Stream
without writing your own Spliterator, but, I don't know why you would want to.
Matcher.find()
is a state changing operation on the Matcher
object so running each find() in a parallel stream would produce inconsistent results. Running the stream in serial wouldn't have better performance that the Java 7 equivalent and would be harder to understand.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With