String testString = "a\\,b\\\\,c,d\\\\\\,e,f\\\\g";
String[] splitedString = test.split(PATTERN_STRING);
for (String string : splitedString) {
System.out.println(string);
}
Here I have a String that encodes List of String into String where escape character is \ and delimiter is ,
Note:(Back slashes in example doubled because of Java code)
Backslash and comma are escaped in the original Strings and result strings are merged with comma. I need a regex to split this string into original list of strings.
So with example of string
"a\,b\\,c,d\\\,e,f\\g"I need to get such strings:
"a\,b\\"
"c"
"d\\\,e"
"f\\g"
So the logic of split is simple: split with delimiter comma only if number of backslashes directly before it is even: 0,2,4... Only in this case this comma is delimiter. If number of backslashes before comma is odd it is escaped comma and no split should occur.
Can anybody help me with appropriate regex for this case?
EDIT
I know that this regex: (?<!\\\\), will help to split string with commas that do not have backslashes before it. But in my case I need to split also in case number of slashes before comma is even.
Appreciate any help.
If it has to be split then you can try something like
split("(?<!(?<!\\\\)\\\\(\\\\{2}){0,1000000000}),")
I used {0,1000000000} instead of * because look-behind in Java needs to have obvious maximal length, and 1000000000 seems to be good enough, unless you can have more than 1000000000 continuous \\ in your text.
If it doesn't have to be split then you can use
Matcher m = Pattern.compile("(\\G.*?(?<!\\\\)(\\\\{2})*)(,|(?<!\\G)$)",
Pattern.DOTALL).matcher(testString);
while (m.find()) {
System.out.println(m.group(1));
}
\\G means end of previous match, or in case this is first iteration of Matcher and there was no previous match start of the string ^.
But fastest and not so hart to implement would be writing your own parser, which would use flag like escaped to signal that current checked character was escaped with \.
public static List<String> parse(String text) {
List<String> tokens = new ArrayList<>();
boolean escaped = false;
StringBuilder sb = new StringBuilder();
for (char ch : text.toCharArray()) {
if (ch == ',' && !escaped) {
tokens.add(sb.toString());
sb.delete(0, sb.length());
} else {
if (ch == '\\')
escaped = !escaped;
else
escaped = false;
sb.append(ch);
}
}
if (sb.length() > 0) {
tokens.add(sb.toString());
sb.delete(0, sb.length());
}
return tokens;
}
String testString = "a\\,b\\\\,c,d\\\\\\,e,f\\\\g";
String[] splitedString = testString
.split("(?<!(?<!\\\\)\\\\(\\\\{2}){0,1000000000}),");
for (String string : splitedString) {
System.out.println(string);
}
System.out.println("-----");
Matcher m = Pattern.compile("(\\G.*?(?<!\\\\)(\\\\{2})*)(,|(?<!\\G)$)",
Pattern.DOTALL).matcher(testString);
while (m.find()) {
System.out.println(m.group(1));
}
System.out.println("-----");
for (String s : parse(testString))
System.out.println(s);
Output:
a\,b\\
c
d\\\,e
f\\g
-----
a\,b\\
c
d\\\,e
f\\g
-----
a\,b\\
c
d\\\,e
f\\g
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With