i would like to find the set of all words in a file.This set should be sorted. Upper and Lower Case doesn't matter. Here is my approach:
public static Set<String> setOfWords(String fileName) throws IOException {
Set<String> wordSet;
Stream<String> stream = java.nio.file.Files.lines(java.nio.file.Paths.get(fileName));
wordSet = stream
.map(line -> line.split("[ .,;?!.:()]"))
.flatMap(Arrays::stream)
.sorted()
.map(String::toLowerCase)
.collect(Collectors.toSet());
stream.close();
return wordSet;
}
Test file:
This is a file with five lines.It has two sentences, and the word file is contained in multiple lines of this file. This file can be used for testing?
When printing the set, i get the following output:
Set of words:
a
be
in
sentences
testing
this
for
multiple
is
it
used
two
the
can
with
contained
file
and
of
has
lines
five
word
Can anybody tell me, why the set is not sorted in it's natural order(for Strings lexiographic)?
Thanks in advance
You can use a sorted collection like a TreeSet
using String.CASE_INSENSITIVE_ORDER
as a Comparator
Set<String> set = stream
.map(line -> line.split("[ .,;?!.:()]"))
.flatMap(Arrays::stream)
.collect(Collectors.toCollection(()-> new TreeSet<>(String.CASE_INSENSITIVE_ORDER)));
Or you can sort the elements using a case insensitive comparator and collect it into a collection that maintains insertion order.
List<String> list = stream
.map(line -> line.split("[ .,;?!.:()]"))
.flatMap(Arrays::stream)
.sorted(String::compareToIgnoreCase)
.distinct()
.collect(Collectors.toList());
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With