Tokenize problem in Java with separator ". "

Tags:

I need to split a text using the separator ". ". For example I want this string :

Washington is the U.S Capital. Barack is living there.

To be cut into two parts:

Washington is the U.S Capital. 
Barack is living there.

Here is my code :

// Initialize the tokenizer
StringTokenizer tokenizer = new StringTokenizer("Washington is the U.S Capital. Barack is living there.", ". ");
 while (tokenizer.hasMoreTokens()) {
      System.out.println(tokenizer.nextToken());

}

And the output is unfortunately :

Washington
is
the
U
S
Capital
Barack
is
living
there

Can someone explain what's going on?

676

asked Jun 04 '10 07:06

poiuytrez

2 Answers

Don't use StringTokenizer; it's a legacy class. Use java.util.Scanner or simply String.split instead.

    String text = "Washington is the U.S Capital. Barack is living there.";
    String[] tokens = text.split("\\. ");
    for (String token : tokens) {
        System.out.println("[" + token + "]");
    }

This prints:

[Washington is the U.S Capital]
[Barack is living there.]

Note that split and Scanner are "regex"-based (regular expressions), and since . is a special regex "meta-character", it needs to be escaped with \. In turn, since \ is itself an escape character for Java string literals, you need to write "\\. " as the delimiter.

This may sound complicated, but it really isn't. split and Scanner are much superior to StringTokenizer, and regex isn't that hard to pick up.

Regular expressions tutorials

Java Lessons/Regular expressions
regular-expressions.info - Very good tutorial, not Java specific

API Links

java.util.StringTokenizer
- StringTokenizer is a legacy class that is retained for compatibility reasons although its use is discouraged in new code. It is recommended that anyone seeking this functionality use the split method of String or the java.util.regex package instead.
java.util.Scanner
- A simple text scanner which can parse primitive types and strings using regular expressions.
- Java Tutorials - Basic I/O - Scanning and formatting
String[] String.split
- Splits this string around matches of the given regular expression.

But what went wrong?

The problem is that StringTokenizer takes each character in the delimiter string as individual delimiters, i.e. NOT the entire String itself.

From the API:

StringTokenizer(String str, String delim): Constructs a string tokenizer for the specified string. The characters in the delim argument are the delimiters for separating tokens. Delimiter characters themselves will not be treated as tokens.

115

answered Oct 09 '22 12:10

polygenelubricants

Your StringTokenizer constructor takes the delimiter ". " which matches dot or space as delimiters.

answered Oct 09 '22 12:10

krock

Related questions
                            
                                Dereferencing the integer value of a for loop in java
                            
                                Setting up eclipse on ubuntu
                            
                                Freemarker model convert timestamp in milliseconds to date
                            
                                How to split filesystem path in Java?
                            
                                Java FlowLayout - Margin/padding of specific elements?
                            
                                Counting how many times my Android app has been opened
                            
                                Sorting ArrayList of Arraylist<String> in java
                            
                                Inconsistencies with && and divide-by-zero error in Java
                            
                                UTC Timestamp + Joda Time
                            
                                How to set the TimeZone for String parsing in Android
                            
                                What is the best way of defining key for @Cacheable annotation for Spring
                            
                                Error of int cannot be dereferenced?
                            
                                Android Firebase setValue() Permission Denied
                            
                                Difference in definition of Actors vs Threads? [duplicate]
                            
                                Collections.sort in java 8 is not working as java 6 while comparator returning 0
                            
                                What's going on with Java's Date class? Is this a known bug?
                            
                                String replace in Java
                            
                                ClassCastException when casting object Array to Long array
                            
                                jaxb entity print out as xml
                            
                                Servlet unit test

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Tokenize problem in Java with separator ". "

Tags:

java

string

tokenize

stringtokenizer