I have a string like below -
value1, value2, value3, value4, "value5, 1234", value6, value7, "value8", value9, "value10, 123.23"
If I tokenize above string I'm getting comma separated tokens. But I would like to say to string tokenizer ignore comma's after double quotes while doing splits. How can I say this?
Thanks in advance
Shashi
Use a CSV parser like OpenCSV to take care of things like commas in quoted elements, values that span multiple lines etc. automatically. You can use the library to serialize your text back as CSV as well.
String str = "value1, value2, value3, value4, \"value5, 1234\", " +
"value6, value7, \"value8\", value9, \"value10, 123.23\"";
CSVReader reader = new CSVReader(new StringReader(str));
String [] tokens;
while ((tokens = reader.readNext()) != null) {
System.out.println(tokens[0]); // value1
System.out.println(tokens[4]); // value5, 1234
System.out.println(tokens[9]); // value10, 123.23
}
You just need one line and the right regex:
String[] values = input.replaceAll("^\"", "").split("\"?(,|$)(?=(([^\"]*\"){2})*[^\"]*$) *\"?");
This also neatly trims off the wrapping double quotes for you too, including the final quote!
Note: Interesting edge case when the first term is quoted required an extra step of trimming the leading quote using replaceAll()
.
Here's some test code:
String input= "\"value1, value2\", value3, value4, \"value5, 1234\", " +
"value6, value7, \"value8\", value9, \"value10, 123.23\"";
String[] values = input.replaceAll("^\"", "").split("\"?(,|$)(?=(([^\"]*\"){2})*[^\"]*$) *\"?");
for (String s : values)
System.out.println(s);
Output:
value1, value2
value3
value4
value5, 1234
value6
value7
value8
value9
value10, 123.23
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With