Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java Regex: Split based on non-word characters except for apostrophe

Tags:

java

regex

I'm trying to split and include based on spaces and non-word characters, except for apostrophes.

I've been able to make it split and include based on spaces and non-word characters, but I can't seem to figure out how to exclude apostrophes from the non-word characters.

This is my current Regex...

str.split("\\s|(?=\\W)");

...which when run on this code sample:

program p;
begin
    write('x');
end.

...produces this result:

program
p
;
begin

write
(
'x   <!-- This is the problem.
'
)
;
end
.

Which is almost correct, but my goal is to skip the apostrophes so that this is the result:

program
p
;
begin

write
(
'x'   <!-- This is the wanted result.
)
;
end
.

UPDATE

As suggested I've tried:

str.split("\\s|(?=\\W)(?<=\\W)");

Which almost works, but does not split all of the special characters correctly:

program
p;
begin
write(
'x'
)
;
end.
like image 453
Dark Knight Avatar asked Feb 07 '23 02:02

Dark Knight


2 Answers

Have you tried...

[^\w']

This will match any character that is neither a word character nor an apostrophe. May be simple enough to work depending on your inputs.

If you run a replace operation using [^\w'] as your regex and \n\1\n as your replacement string, it should get you close to where you'd like to be.

like image 85
wpcarro Avatar answered Feb 08 '23 14:02

wpcarro


You can split on this.

\s|('[^']*')|(?=\W)

See demo.

https://regex101.com/r/mL7eL6/1

like image 28
vks Avatar answered Feb 08 '23 16:02

vks