Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

grouping words by first character

What I have: A text-file which is read line by line. Each String contains a line.

What I want: Group ALL words by first character using Java Streams.

What I have so far:

public static Map<Character, List<String>> groupByFirstChar(String fileName)
        throws IOException {

    return Files.lines(Paths.get(PATH)).
            flatMap(s -> Stream.of(s.split("[^a-zA-Z]"))).
            map(s -> s.toLowerCase()).
            sorted((s1, s2) -> s1.compareTo(s2)).
            collect(Collectors.groupingBy(s -> s.charAt(0)));
}

Problem: I get an Exception

Exception in thread "main" java.lang.StringIndexOutOfBoundsException: String index out of range: 0
at java.lang.String.charAt(String.java:646)
at textana.TextAnalysisFns.lambda$16(TextAnalysisFns.java:110)
at textana.TextAnalysisFns$$Lambda$36/159413332.apply(Unknown Source)
at java.util.stream.Collectors.lambda$groupingBy$196(Collectors.java:907)
at java.util.stream.Collectors$$Lambda$23/189568618.accept(Unknown Source)
at java.util.stream.ReduceOps$3ReducingSink.accept(ReduceOps.java:169)
at java.util.stream.SortedOps$RefSortingSink$$Lambda$37/186370029.accept(Unknown Source)
at java.util.ArrayList.forEach(ArrayList.java:1249)
at java.util.stream.SortedOps$RefSortingSink.end(SortedOps.java:390)
at java.util.stream.Sink$ChainedReference.end(Sink.java:258)
at java.util.stream.Sink$ChainedReference.end(Sink.java:258)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:513)
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:502)
at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
at textana.TextAnalysisFns.groupByFirstChar(TextAnalysisFns.java:110)
at textana.SampleTextAnalysisApp.main(SampleTextAnalysisApp.java:95)

Question: Why do I get a StringIndexOutOfBoundException ?

Solution based on the hints in the comments:

public static Map<Character, List<String>> groupByFirstChar(String fileName)
        throws IOException {

    return Files.lines(Paths.get(PATH)).
            flatMap(s -> Stream.of(s.split("[^a-zA-Z]"))).
            filter(s -> s.length() > 0).
            map(s -> s.toLowerCase()).
            collect(Collectors.groupingBy(s -> s.charAt(0)));
}

The Solution of User Eran would have given me the empty Strings in the beginning which I didn't want to have.

like image 240
SklogW Avatar asked May 03 '15 10:05

SklogW


2 Answers

Try filtering empty strings "" since they have no first character which is causing charAt(0) to throw this exception.

You can use

flatMap(s -> Stream.of(s.split("[^a-zA-Z]"))).
filter(s -> !s.trim().isEmpty()). //add this line

BTW your method should probably use its fileName argument. So maybe consider changing Paths.get(PATH) into something more like

Paths.get(fileName).

or

Paths.get(PATH).resolve(fileName)

Also as already mentioned by comment since you are not changing default comparison order you don't need to explicitly write

sorted((s1, s2) -> s1.compareTo(s2))

but simple

sorted()

will work as well since default order will be applied here.


As mentioned by @Alexis C. groupBy will return HashMap which means that your keys will not be ordered. If you would also like to preserve their order you can use groupBy with LinkedHashMap like

.collect(Collectors.groupingBy(s -> s.charAt(0), LinkedHashMap::new, Collectors.toList()));
like image 62
Pshemo Avatar answered Oct 18 '22 21:10

Pshemo


You most likely have an empty line at the end of your file, perhaps silently added by your text editor, that makes the last s.charAt(0) fail.

Hint about how to detect it: in the stack trace, read collect and lambda$16.

like image 1
Ekleog Avatar answered Oct 18 '22 22:10

Ekleog