I need to split a text using the separator ". "
. For example I want this string :
Washington is the U.S Capital. Barack is living there.
To be cut into two parts:
Washington is the U.S Capital.
Barack is living there.
Here is my code :
// Initialize the tokenizer
StringTokenizer tokenizer = new StringTokenizer("Washington is the U.S Capital. Barack is living there.", ". ");
while (tokenizer.hasMoreTokens()) {
System.out.println(tokenizer.nextToken());
}
And the output is unfortunately :
Washington
is
the
U
S
Capital
Barack
is
living
there
Can someone explain what's going on?
Example of hasMoreTokens() method of the StringTokenizer class. This method returns true if more tokens are available in the tokenizer String otherwise returns false. The above Java program shows the use of two methods hasMoreTokens() and nextToken() of StringTokenizer class.
In order to break String into tokens, you need to create a StringTokenizer object and provide a delimiter for splitting strings into tokens. You can pass multiple delimiters e.g. you can break String into tokens by, and: at the same time. If you don't provide any delimiter then by default it will use white-space.
Constructs a string tokenizer for the specified string. The tokenizer uses the default delimiter set, which is " \t\n\r\f" : the space character, the tab character, the newline character, the carriage-return character, and the form-feed character.
The split() method is preferred and recommended even though it is comparatively slower than StringTokenizer. This is because it is more robust and easier to use than StringTokenizer.
Don't use StringTokenizer
; it's a legacy class. Use java.util.Scanner
or simply String.split
instead.
String text = "Washington is the U.S Capital. Barack is living there.";
String[] tokens = text.split("\\. ");
for (String token : tokens) {
System.out.println("[" + token + "]");
}
This prints:
[Washington is the U.S Capital]
[Barack is living there.]
Note that split
and Scanner
are "regex"-based (regular expressions), and since .
is a special regex "meta-character", it needs to be escaped with \
. In turn, since \
is itself an escape character for Java string literals, you need to write "\\. "
as the delimiter.
This may sound complicated, but it really isn't. split
and Scanner
are much superior to StringTokenizer
, and regex isn't that hard to pick up.
java.util.StringTokenizer
StringTokenizer
is a legacy class that is retained for compatibility reasons although its use is discouraged in new code. It is recommended that anyone seeking this functionality use the split
method of String
or the java.util.regex
package instead.java.util.Scanner
String[] String.split
The problem is that StringTokenizer
takes each character in the delimiter string as individual delimiters, i.e. NOT the entire String
itself.
From the API:
StringTokenizer(String str, String delim)
: Constructs a string tokenizer for the specified string. The characters in thedelim
argument are the delimiters for separating tokens. Delimiter characters themselves will not be treated as tokens.
Your StringTokenizer constructor takes the delimiter ". " which matches dot or space as delimiters.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With