My input string contains mixed type of line separators including '\r\n', '\r' or '\n'. I want to split the string and keep the line separator with the substring that precedes it. I followed two postings below
How to split a string, but also keep the delimiters?
Split Java String by New Line
and come up with something like:
String input = "1 dog \r\n 2 cat";
String[] output = input.split( "(?<=((\\r\\n)|\\r|\\n))")));
the output is ["1 dog\r", "\n", " 2 cat"]
, however the desired output is ["1 dog\r\n", " 2 cat"]
.
If I change the input to either String input = "1 dog \r 2 cat";
or String input = "1 dog \n 2 cat";
, my code can produce desired output. Please advise. Thanks in advance.
You get your result ["1 dog\r", "\n", " 2 cat"]
because your pattern uses an alternation which will match either (\r\n)
or \r
or \n
.
When \r\n
is encountered in the example string, the lookbehind assertion will be true after \r
and will split for the first time.
Then the lookbehind assertion will be true after \n
and will split for the second time.
What you might do is use \R
in the positive lookbehind to assert what is on the left is a unicode newline sequence:
String input = "1 dog \r\n 2 cat";
String[] output = input.split("(?<=\\R)");
Java demo
Another option to fix your regex is to make it an atomic group:
(?<=(?>\\r\\n|\\r|\\n))
Java demo
Reading this post, when the \r
is matched in the lookbehind using an atomic group, the following \n
is also matched.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With