Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Splitting a string that has escape sequence using regular expression in Java

Tags:

java

regex

split

String to be split

abc:def:ghi\:klm:nop

String should be split based on ":" "\" is escape character. So "\:" should not be treated as token.

split(":") gives

[abc]
[def]
[ghi\]
[klm]
[nop]

Required output is array of string

[abc]
[def]
[ghi\:klm]
[nop]

How can the \: be ignored

like image 565
rgx Avatar asked Oct 06 '10 07:10

rgx


1 Answers

Use a look-behind assertion:

split("(?<!\\\\):")

This will only match if there is no preceding \. Using double escaping \\\\ is required as one is required for the string declaration and one for the regular expression.

Note however that this will not allow you to escape backslashes, in the case that you want to allow a token to end with a backslash. To do that you will have to first replace all double backslashes with

string.replaceAll("\\\\\\\\", ESCAPE_BACKSLASH)

(where ESCAPE_BACKSLASH is a string which will not occur in your input) and then, after splitting using the look-behind assertion, replace the ESCAPE_BACKSLASH string with an unescaped backslash with

token.replaceAll(ESCAPE_BACKSLASH, "\\\\")
like image 190
Gumbo Avatar answered Sep 22 '22 10:09

Gumbo