Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Joining strings with limit

Using only the standard Java library, what is a simple mechanism to join strings up to a limit, and append an ellipsis when the limit results in a shorter string?

Efficiency is desirable. Joining all the strings and then using String.substring() may consume excessive memory and time. A mechanism that can be used within a Java 8 stream pipeline is preferable, so that the strings past the limit might never even be created.

For my purposes, I would be happy with a limit expressed in either:

  • Maximum number of strings to join
  • Maximum number of characters in result, including any separator characters.

For example, this is one way to enforce a maximum number of joined strings in Java 8 with the standard library. Is there a simpler approach?

final int LIMIT = 8;

Set<String> mySet = ...;
String s = mySet.stream().limit( LIMIT ).collect( Collectors.joining(", "));
if ( LIMIT < mySet.size()) {
    s += ", ...";
}
like image 985
Andy Thomas Avatar asked Mar 04 '16 18:03

Andy Thomas


3 Answers

You can write your custom collector for this. This one is based on another I wrote for a similar case:

private static Collector<String, List<String>, String> limitingJoin(String delimiter, int limit, String ellipsis) {
    return Collector.of(
                ArrayList::new, 
                (l, e) -> {
                    if (l.size() < limit) l.add(e);
                    else if (l.size() == limit) l.add(ellipsis);
                },
                (l1, l2) -> {
                    l1.addAll(l2.subList(0, Math.min(l2.size(), Math.max(0, limit - l1.size()))));
                    if (l1.size() == limit) l1.add(ellipsis);
                    return l1;
                },
                l -> String.join(delimiter, l)
           );
}

In this code, we keep an ArrayList<String> of all the encoutered Strings. When an element is accepted, the size of the current list is tested against the limit: strictly less than it, the element is added; equal to it, the ellipsis is added. The same is done for the combiner part, which is a bit more tricky because we need to handle properly the size of the sublists to not go over the limit. Finally, the finisher just joins that list with the given delimiter.

This implementation works for parallel Streams. It will keep the head elements of the Stream in encounter order. Note that it does consume all the elements in the Stream even though no element are added after the limit has been reached.

Working example:

List<String> list = Arrays.asList("foo", "bar", "baz");
System.out.println(list.stream().collect(limitingJoin(", ", 2, "..."))); // prints "foo, bar, ..."
like image 172
Tunaki Avatar answered Sep 20 '22 19:09

Tunaki


While using third-party code is not an option for the asker, it might be acceptable for other readers. Even writing custom collector you still have a problem: the whole input will be processed as standard collectors cannot short-circuit (in particular it's impossible to process infinite Stream). My StreamEx library enhances collectors concept making possible to create short-circuiting collector. The Joining collector is also readily provided:

StreamEx.of(mySet).collect( 
    Joining.with(", ").ellipsis("...").maxChars(100).cutAfterDelimiter() );

The result is guaranteed not to exceed 100 characters. Different counting strategies could be used: you can limit by chars, by code points or by graphemes (combining Unicode characters will not be counted). Also you can cut the result at any position ("First entry, second en...") or after word ("First entry, second ..."), or after delimiter ("First entry, ..."), or before delimiter ("First entry, second entry..."). It also works for parallel stream, though probably not very efficient in ordered case.

like image 43
Tagir Valeev Avatar answered Sep 18 '22 19:09

Tagir Valeev


Using only the standard Java library

I don't believe there is anything in there that can do what you ask.

You need to write your own Collector. It won't be that complicated, so I don't see why writing your own would be an issue.

like image 33
Andreas Avatar answered Sep 21 '22 19:09

Andreas