Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Tokenize a string with a space in java

Tags:

java

tokenize

I want to tokenize a string like this

String line = "a=b c='123 456' d=777 e='uij yyy'";

I cannot split based like this

String [] words = line.split(" ");

Any idea how can I split so that I get tokens like

a=b
c='123 456'
d=777
e='uij yyy';  
like image 631
kal Avatar asked Oct 01 '09 00:10

kal


2 Answers

The simplest way to do this is by hand implementing a simple finite state machine. In other words, process the string a character at a time:

  • When you hit a space, break off a token;
  • When you hit a quote keep getting characters until you hit another quote.
like image 80
cletus Avatar answered Oct 10 '22 14:10

cletus


Depending on the formatting of your original string, you should be able to use a regular expression as a parameter to the java "split" method: Click here for an example.

The example doesn't use the regular expression that you would need for this task though.

You can also use this SO thread as a guideline (although it's in PHP) which does something very close to what you need. Manipulating that slightly might do the trick (although having quotes be part of the output or not may cause some issues). Keep in mind that regex is very similar in most languages.

Edit: going too much further into this type of task may be ahead of the capabilities of regex, so you may need to create a simple parser.

like image 42
Sev Avatar answered Oct 10 '22 15:10

Sev