I have the following test string
This is my "te
st" case
with lines for "tes"t"ing" with regex
But as he said "It could be an arbitrary number of words"
And I want to match everything which is between " as long as it is bound to words. I have the following regexp:
\"([^\"]*)\"
which matches quite well the words of "test" even if its split apart. Is there a way to find a tes"t"ing as well a whole word (and not split apart into two words? Trying with the word boundaries \b (\b\"([^\"]*)\"\b) doesn't work very well because it won't match the very first " nor the just mentioned group.
I need it for Java regexp.
UPDATE As a result I need to have
This is my \q{te
st} case
with lines for \q{tes"t"ing} with regex
But as he said \q{It could be an arbitrary number of words}
You may use
.replaceAll("\\B\"\\b(.*?)\\b\"\\B", "\\\\q{$1}")
Or, if the matches may span across multiple lines, add (?s) modifier:
.replaceAll("(?s)\\B\"\\b(.*?)\\b\"\\B", "\\\\q{$1}")
See the regex demo .
Details
\B"\b - a " that is either at the start of the string or preceded with a non-word char, and that is followed with a word char(.*?) - Group 1: any zero or more chars other than line break chars, as few as possible\b"\B - a " that is either at the end of the string or followed with a non-word char, and that is preceded with a word char.The replacement is a backslash ("\\\\", note the double literal backslash is necessary in the regex replacement part to insert a real, literal backslash since a backslash is a special char in the replacement pattern), q{, the Group1 value ($1) and a }.
See the Java demo:
String s = "This is my \"te\n\nst\" case\nwith lines for \"tes\"t\"ing\" with regex\nBut as he said \"It could be an arbitrary number of words\"";
System.out.println(s.replaceAll("\\B\"\\b(.*?)\\b\"\\B", "\\\\q{$1}"));
Output:
This is my "te
st" case
with lines for \q{tes"t"ing} with regex
But as he said \q{It could be an arbitrary number of words}
NOTE:
If you also need to match two consecutive double quotes that are not preceded, nor followed with word characters, you can modify the above regular expression the following way:
.replaceAll("(?s)\\B(\"\\b(.*?)\\b\"|\"\")\\B", "\\\\q{$2}")
See the regex demo.
Details
(?s) - an embedded flag option (equal to Pattern.DOTALL) that makes . match line break chars, too\B - a non-word boundary, here, it means that immediately to the left, there must be a non-word char or start of string (because after \B, there is a non-word char, ")( - start of the first capturing group:
"\b(.*?)\b" - " followed with a word char, then Group 2 capturing any zero or more chars, as few as possible, and then a " that is preceded with a word char (that is why this pattern can't match "", since after the first and before the second, there must be a letter, digit or _)| - or"" - a "" substring) - end of the first capturing group\B - a non-word boundary, here, it means that immediately to the right, there must be a non-word char or end of string (because before \B, there is a non-word char, ").If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With