I have a text file that contains URLs and emails. I need to extract all of them from the file. Each URL and email can be found more then once, but result shouldn't contain duplicates. I can extract all URLs using the following code:
Files.lines(filePath).
.map(urlPattern::matcher)
.filter(Matcher::find)
.map(Matcher::group)
.distinct();
I can extract all emails using the following code:
Files.lines(filePath).
.map(emailPattern::matcher)
.filter(Matcher::find)
.map(Matcher::group)
.distinct();
Can I extract all URLs and emails reading the stream returned by Files.lines(filePath)
only one time?
Something like splitting stream of lines to stream of URLs and stream of emails.
If we want to split a stream in two, we can use partitioningBy from the Collectors class. It takes a Predicate and returns a Map that groups elements that satisfied the predicate under the Boolean true key and the rest under false.
A stringstream associates a string object with a stream allowing you to read from the string as if it were a stream (like cin). To use stringstream, we need to include sstream header file. The stringstream class is extremely useful in parsing input.
You can use partitioningBy
collector, though it's still not very elegant solution.
Map<Boolean, List<String>> map = Files.lines(filePath)
.filter(str -> urlPattern.matcher(str).matches() ||
emailPattern.matcher(str).matches())
.distinct()
.collect(Collectors.partitioningBy(str -> urlPattern.matcher(str).matches()));
List<String> urls = map.get(true);
List<String> emails = map.get(false);
If you don't want to apply regexp twice, you can make it using the intermediate pair object (for example, SimpleEntry
):
public static String classify(String str) {
return urlPattern.matcher(str).matches() ? "url" :
emailPattern.matcher(str).matches() ? "email" : null;
}
Map<String, Set<String>> map = Files.lines(filePath)
.map(str -> new AbstractMap.SimpleEntry<>(classify(str), str))
.filter(e -> e.getKey() != null)
.collect(Collectors.groupingBy(e -> e.getKey(),
Collectors.mapping(e -> e.getValue(), Collectors.toSet())));
Using my free StreamEx library the last step would be shorter:
Map<String, Set<String>> map = StreamEx.of(Files.lines(filePath))
.mapToEntry(str -> classify(str), Function.identity())
.nonNullKeys()
.grouping(Collectors.toSet());
You can perform the matching within a Collector
:
Map<String,Set<String>> map=Files.lines(filePath)
.collect(HashMap::new,
(hm,line)-> {
Matcher m=emailPattern.matcher(line);
if(m.matches())
hm.computeIfAbsent("mail", x->new HashSet<>()).add(line);
else if(m.usePattern(urlPattern).matches())
hm.computeIfAbsent("url", x->new HashSet<>()).add(line);
},
(m1,m2)-> m2.forEach((k,v)->m1.merge(k, v,
(s1,s2)->{s1.addAll(s2); return s1;}))
);
Set<String> mail=map.get("mail"), url=map.get("url");
Note that this can easily be adapted to find multiple matches within a line:
Map<String,Set<String>> map=Files.lines(filePath)
.collect(HashMap::new,
(hm,line)-> {
Matcher m=emailPattern.matcher(line);
while(m.find())
hm.computeIfAbsent("mail", x->new HashSet<>()).add(m.group());
m.usePattern(urlPattern).reset();
while(m.find())
hm.computeIfAbsent("url", x->new HashSet<>()).add(m.group());
},
(m1,m2)-> m2.forEach((k,v)->m1.merge(k, v,
(s1,s2)->{s1.addAll(s2); return s1;}))
);
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With