Let's say that I want to remove all the non-letters from my String
.
String s = "abc-de3-2fg";
I can use an IntStream
in order to do that:
s.stream().filter(ch -> Character.isLetter(ch)). // But then what?
What can I do in order to convert this stream back to a String
instance?
On a different note, why can't I treat a String
as a stream of objects of type Character
?
String s = "abc-de3-2fg"; // Yields a Stream of char[], therefore doesn't compile Stream<Character> stream = Stream.of(s.toCharArray()); // Yields a stream with one member - s, which is a String object. Doesn't compile Stream<Character> stream = Stream.of(s);
According to the javadoc, the Stream
's creation signature is as follows:
Stream.of(T... values)
The only (lousy) way that I could think of is:
String s = "abc-de3-2fg"; Stream<Character> stream = Stream.of(s.charAt(0), s.charAt(1), s.charAt(2), ...)
And of course, this isn't good enough... What am I missing?
Here's an answer the second part of the question. If you have an IntStream
resulting from calling string.chars()
you can get a Stream<Character>
by casting to char
and then boxing the result by calling mapToObj
. For example, here's how to turn a String
into a Set<Character>
:
Set<Character> set = string.chars() .mapToObj(ch -> (char)ch) .collect(Collectors.toSet());
Note that casting to char
is essential for the boxed result to be Character
instead of Integer
.
Now the big problem with dealing with char
or Character
data is that supplementary characters are represented as surrogate pairs of char
values, so any algorithm with deals with individual char
values will probably fail when presented with supplementary characters.
(It may seem like supplementary characters are an obscure Unicode feature that we don't need to worry about, but as far as I know, all emoji are supplementary characters.)
Consider this example:
string.chars() .filter(Character::isAlphabetic) ...
This will fail if presented with a string that contains the code point U+1D400 (Mathematical Bold Capital A). That code point is represented as a surrogate pair in the string, and neither value of a surrogate pair is an alphabetic character. To get the correct result, you'd need to do this instead:
string.codePoints() .filter(Character::isAlphabetic) ...
I recommend always using codePoints()
.
Now, given an IntStream
of code points, how can one reassemble it into a String? Sleiman Jneidi's answer is a reasonable one (+1), using the three-arg collect()
method of IntStream
.
Here's an alternative:
StringBuilder sb = ... ; string.codePoints() .filter(...) .forEachOrdered(sb::appendCodePoint); return sb.toString();
This might be a bit more flexible, in cases where you already have a StringBuilder
that you're using to accumulate string data. You don't have to create a new StringBuilder
each time, nor do you have to convert it to a String
afterwards.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With