Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

String Tokenizer : split string by comma and ignore comma in double quotes

Tags:

java

string

regex

I have a string like below -

value1, value2, value3, value4, "value5, 1234", value6, value7, "value8", value9, "value10, 123.23"

If I tokenize above string I'm getting comma separated tokens. But I would like to say to string tokenizer ignore comma's after double quotes while doing splits. How can I say this?

Thanks in advance

Shashi

like image 274
Shashi Avatar asked Oct 08 '13 06:10

Shashi


2 Answers

Use a CSV parser like OpenCSV to take care of things like commas in quoted elements, values that span multiple lines etc. automatically. You can use the library to serialize your text back as CSV as well.

String str = "value1, value2, value3, value4, \"value5, 1234\", " +
        "value6, value7, \"value8\", value9, \"value10, 123.23\"";

CSVReader reader = new CSVReader(new StringReader(str));

String [] tokens;
while ((tokens = reader.readNext()) != null) {
    System.out.println(tokens[0]); // value1
    System.out.println(tokens[4]); // value5, 1234
    System.out.println(tokens[9]); // value10, 123.23
}
like image 117
Ravi K Thapliyal Avatar answered Sep 28 '22 21:09

Ravi K Thapliyal


You just need one line and the right regex:

String[] values = input.replaceAll("^\"", "").split("\"?(,|$)(?=(([^\"]*\"){2})*[^\"]*$) *\"?");

This also neatly trims off the wrapping double quotes for you too, including the final quote!

Note: Interesting edge case when the first term is quoted required an extra step of trimming the leading quote using replaceAll().

Here's some test code:

String input= "\"value1, value2\", value3, value4, \"value5, 1234\", " +
    "value6, value7, \"value8\", value9, \"value10, 123.23\"";
String[] values = input.replaceAll("^\"", "").split("\"?(,|$)(?=(([^\"]*\"){2})*[^\"]*$) *\"?");
for (String s : values)
System.out.println(s);

Output:

value1, value2
value3
value4
value5, 1234
value6
value7
value8
value9
value10, 123.23
like image 22
Bohemian Avatar answered Sep 28 '22 23:09

Bohemian