Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Handling delimiter with escape characters in Java String.split() method

Tags:

I have searched the web for my query, but didn't get the answer which fits my requirement exactly. I have my string like below:

A|B|C|The Steading\|Keir Allan\|Braco|E

My Output should look like below:

A
B
C
The Steading|Keir Allan|Braco
E

My requirement is to skip the delimiter if it is preceded by the escape sequence. I have tried the following using negative lookbehinds in String.split():

(?<!\\)\|

But, my problem is the delimiter will be defined by the end user dynamically and it need not be always |. It can be any character on the keyboard (no restrictions). Hence, my doubt is that the above regex might fail for some of the special characters which are not allowed in regex.

I just wanted to know if this is the perfect way to do it.

like image 833
user2757740 Avatar asked Sep 07 '13 20:09

user2757740


People also ask

How do you split a string with escape characters?

The standard solution is to use the split() method provided by the String class. It takes a regular expression as a delimiter and returns a string array. We can make the above code work by escaping the dot character. An escape character invokes an alternative interpretation on the following characters of a string.

How do you split a string but keep the delimiters?

Summary: To split a string and keep the delimiters/separators you can use one of the following methods: Use a regex module and the split() method along with \W special character. Use a regex module and the split() method along with a negative character set [^a-zA-Z0-9] .

What does split () in Java?

Java split() function is used to splitting the string into the string array based on the regular expression or the given delimiter. The resultant object is an array contains the split strings. In the resultant returned array, we can pass the limit to the number of elements.


2 Answers

You can use Pattern.quote():

String regex = "(?<!\\\\)" + Pattern.quote(delim);

Using your example:

String delim = "|";
String regex = "(?<!\\\\)" + Pattern.quote(delim);

for (String s : "A|B|C|The Steading\\|Keir Allan\\|Braco|E".split(regex))
    System.out.println(s);
A
B
C
The Steading\|Keir Allan\|Braco
E

You can extend this to use a custom escape sequence as well:

String delim = "|";
String esc = "+";
String regex = "(?<!" + Pattern.quote(esc) + ")" + Pattern.quote(delim);

for (String s : "A|B|C|The Steading+|Keir Allan+|Braco|E".split(regex))
    System.out.println(s);
A
B
C
The Steading+|Keir Allan+|Braco
E
like image 171
arshajii Avatar answered Sep 20 '22 07:09

arshajii


I know this is an old thread, but the lookbehind solution has an issue, that it doesn't allow escaping of the escape character (the split would not occur on A|B|C|The Steading\\|Keir Allan\|Braco|E)).

The positive matching solution in thread Regex and escaped and unescaped delimiter works better (with modification using Pattern.quote() if the delimiter is dynamic).

like image 28
Jan Cetkovsky Avatar answered Sep 21 '22 07:09

Jan Cetkovsky