Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java : Splitting a String using Regex

Tags:

java

regex

I have to split a string using comma(,) as a separator and ignore any comma that is inside quotes(")

fieldSeparator : ,
fieldGrouper : "

The string to split is : "1","2",3,"4,5"

I am able to achieve it as follows :

String record = "\"1\",\"2\",3,\"4,5\"";
String[] tokens = record.split(",(?=([^\"]*\"[^\"]*\")*[^\"]*$)");

Output :

"1"
"2"
3
"4,5"

Now the challenge is that the fieldGrouper(") should not be a part of the split tokens. I am unable to figure out the regex for this.

The expected output of the split is :

1
2
3
4,5
like image 900
rvd Avatar asked Mar 07 '16 11:03

rvd


3 Answers

Update:

String[] tokens = record.split( "(,*\",*\"*)" );

Result:
Image Link

Initial Solution:
( doesn't work @ .split method )

This RexEx pattern will isolate the sections you want:
(?:\\")(.*?)(?:\\")

It uses non-capturing groups to isolate the pairs of escaped quotes, and a capturing group to isolate everything in between.

Check it out here: Live Demo

like image 164
Enteleform Avatar answered Oct 05 '22 15:10

Enteleform


My suggestion:

"([^"]+)"|(?<=,|^)([^,]*)

See the regex demo. It will match "..." like strings and capture into Group 1 only what is in-between the quotes, and then will match and capture into Group 2 sequences of characters other than , at the start of a string or after a comma.

Here is a Java sample code:

String s = "value1,\"1\",\"2\",3,\"4,5\",value2";
Pattern pattern = Pattern.compile("\"([^\"]+)\"|(?<=,|^)([^,]*)");
Matcher matcher = pattern.matcher(s);
List<String> res = new ArrayList<String>();
while (matcher.find()){                      // Run the matcher
    if (matcher.group(1) != null) {          // If Group 1 matched
        res.add(matcher.group(1));           // Add it to the resulting array
    } else {
        res.add(matcher.group(2));           // Add Group 2 as it got matched
    }
} 
System.out.println(res); // => [value1, 1, 2, 3, 4,5, value2]
like image 35
Wiktor Stribiżew Avatar answered Oct 05 '22 17:10

Wiktor Stribiżew


I would try with this kind of workaround:

String record = "\"1\",\"2\",3,\"4,5\"";
record = record.replaceAll("\"?(?<!\"\\w{1,9999}),\"?|\""," ");
String[] tokens = record.trim().split(" ");
for(String str : tokens){
    System.out.println(str);
}

Output:

1
2
3
4,5
like image 20
m.cekiera Avatar answered Oct 05 '22 15:10

m.cekiera