I'm trying to split and include based on spaces and non-word characters, except for apostrophes.
I've been able to make it split and include based on spaces and non-word characters, but I can't seem to figure out how to exclude apostrophes from the non-word characters.
This is my current Regex...
str.split("\\s|(?=\\W)");
...which when run on this code sample:
program p;
begin
write('x');
end.
...produces this result:
program
p
;
begin
write
(
'x <!-- This is the problem.
'
)
;
end
.
Which is almost correct, but my goal is to skip the apostrophes so that this is the result:
program
p
;
begin
write
(
'x' <!-- This is the wanted result.
)
;
end
.
UPDATE
As suggested I've tried:
str.split("\\s|(?=\\W)(?<=\\W)");
Which almost works, but does not split all of the special characters correctly:
program
p;
begin
write(
'x'
)
;
end.
Have you tried...
[^\w']
This will match any character that is neither a word character nor an apostrophe. May be simple enough to work depending on your inputs.
If you run a replace operation using [^\w']
as your regex and \n\1\n
as your replacement string, it should get you close to where you'd like to be.
You can split on this.
\s|('[^']*')|(?=\W)
See demo.
https://regex101.com/r/mL7eL6/1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With