Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

string tokenizer in Java

Tags:

I have a text file which contains data seperated by '|'. I need to get each field(seperated by '|') and process it. The text file can be shown as below :

ABC|DEF||FGHT

I am using string tokenizer(JDK 1.4) for getting each field value. Now the problem is, I should get an empty string after DEF.However, I am not getting the empty space between DEF & FGHT.

My result should be - ABC,DEF,"",FGHT but I am getting ABC,DEF,FGHT

like image 201
ASD Avatar asked Mar 01 '10 13:03

ASD


People also ask

What is string tokenizer in Java?

The string tokenizer class allows an application to break a string into tokens. The tokenization method is much simpler than the one used by the StreamTokenizer class. The StringTokenizer methods do not distinguish among identifiers, numbers, and quoted strings, nor do they recognize and skip comments.

What are the different string tokenizer constructors in Java?

There are 3 constructors defined in the StringTokenizer class. It creates StringTokenizer with specified string. It creates StringTokenizer with specified string and delimiter. It creates StringTokenizer with specified string, delimiter and returnValue.

What does it mean to tokenize a string?

Tokenization is the act of breaking up a sequence of strings into pieces such as words, keywords, phrases, symbols and other elements called tokens. Tokens can be individual words, phrases or even whole sentences. In the process of tokenization, some characters like punctuation marks are discarded.

Which package includes string tokenizer?

util package includes StringTokenizer tokenizes string into independent words - Core Java. Q.


2 Answers

From StringTokenizer documentation :

StringTokenizer is a legacy class that is retained for compatibility reasons although its use is discouraged in new code. It is recommended that anyone seeking this functionality use the split method of String or the java.util.regex package instead.

The following code should work :

String s = "ABC|DEF||FGHT"; String[] r = s.split("\\|"); 
like image 147
Desintegr Avatar answered Sep 19 '22 19:09

Desintegr


Use the returnDelims flag and check two subsequent occurrences of the delimiter:

String str = "ABC|DEF||FGHT"; String delim = "|"; StringTokenizer tok = new StringTokenizer(str, delim, true);  boolean expectDelim = false; while (tok.hasMoreTokens()) {     String token = tok.nextToken();     if (delim.equals(token)) {         if (expectDelim) {             expectDelim = false;             continue;         } else {             // unexpected delim means empty token             token = null;         }     }      System.out.println(token);     expectDelim = true; } 

this prints

ABC DEF null FGHT 

The API isn't pretty and therefore considered legacy (i.e. "almost obsolete"). Use it only with where pattern matching is too expensive (which should only be the case for extremely long strings) or where an API expects an Enumeration.

In case you switch to String.split(String), make sure to quote the delimiter. Either manually ("\\|") or automatically using string.split(Pattern.quote(delim));

like image 38
sfussenegger Avatar answered Sep 21 '22 19:09

sfussenegger