Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Stream of Strings isn't sorted?

i would like to find the set of all words in a file.This set should be sorted. Upper and Lower Case doesn't matter. Here is my approach:

public static Set<String> setOfWords(String fileName) throws IOException {

    Set<String> wordSet;
    Stream<String> stream = java.nio.file.Files.lines(java.nio.file.Paths.get(fileName));

    wordSet = stream
                .map(line -> line.split("[ .,;?!.:()]"))
                .flatMap(Arrays::stream)
                .sorted()
                .map(String::toLowerCase)
                .collect(Collectors.toSet());
    stream.close();
    return wordSet;
}

Test file:

This is a file with five lines.It has two sentences, and the word file is contained in multiple lines of this file. This file can be used for testing?

When printing the set, i get the following output:

Set of words: 
a
be
in
sentences
testing
this
for
multiple
is
it
used
two
the
can
with
contained
file
and
of
has
lines
five
word

Can anybody tell me, why the set is not sorted in it's natural order(for Strings lexiographic)?

Thanks in advance

like image 971
Don Avatar asked Dec 15 '22 06:12

Don


1 Answers

You can use a sorted collection like a TreeSet using String.CASE_INSENSITIVE_ORDER as a Comparator

Set<String> set = stream
            .map(line -> line.split("[ .,;?!.:()]"))
            .flatMap(Arrays::stream)
            .collect(Collectors.toCollection(()-> new TreeSet<>(String.CASE_INSENSITIVE_ORDER)));

Or you can sort the elements using a case insensitive comparator and collect it into a collection that maintains insertion order.

List<String> list = stream
            .map(line -> line.split("[ .,;?!.:()]"))
            .flatMap(Arrays::stream)
            .sorted(String::compareToIgnoreCase)
            .distinct()
            .collect(Collectors.toList());
like image 151
Sleiman Jneidi Avatar answered Feb 10 '23 18:02

Sleiman Jneidi