Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java: RegExp for matching words between a quote

Tags:

java

regex

I have the following test string

This is my "te

st" case
with lines for "tes"t"ing" with regex
But as he said "It could be an arbitrary number of words"

And I want to match everything which is between " as long as it is bound to words. I have the following regexp:

\"([^\"]*)\"

which matches quite well the words of "test" even if its split apart. Is there a way to find a tes"t"ing as well a whole word (and not split apart into two words? Trying with the word boundaries \b (\b\"([^\"]*)\"\b) doesn't work very well because it won't match the very first " nor the just mentioned group.

I need it for Java regexp.

UPDATE As a result I need to have

This is my \q{te

st} case
with lines for \q{tes"t"ing} with regex
But as he said \q{It could be an arbitrary number of words}
like image 764
LeO Avatar asked Feb 21 '26 14:02

LeO


1 Answers

You may use

.replaceAll("\\B\"\\b(.*?)\\b\"\\B", "\\\\q{$1}")

Or, if the matches may span across multiple lines, add (?s) modifier:

.replaceAll("(?s)\\B\"\\b(.*?)\\b\"\\B", "\\\\q{$1}")

See the regex demo .

Details

  • \B"\b - a " that is either at the start of the string or preceded with a non-word char, and that is followed with a word char
  • (.*?) - Group 1: any zero or more chars other than line break chars, as few as possible
  • \b"\B - a " that is either at the end of the string or followed with a non-word char, and that is preceded with a word char.

The replacement is a backslash ("\\\\", note the double literal backslash is necessary in the regex replacement part to insert a real, literal backslash since a backslash is a special char in the replacement pattern), q{, the Group1 value ($1) and a }.

See the Java demo:

String s = "This is my \"te\n\nst\" case\nwith lines for \"tes\"t\"ing\" with regex\nBut as he said \"It could be an arbitrary number of words\"";
System.out.println(s.replaceAll("\\B\"\\b(.*?)\\b\"\\B", "\\\\q{$1}"));

Output:

This is my "te

st" case
with lines for \q{tes"t"ing} with regex
But as he said \q{It could be an arbitrary number of words}

NOTE:

If you also need to match two consecutive double quotes that are not preceded, nor followed with word characters, you can modify the above regular expression the following way:

 .replaceAll("(?s)\\B(\"\\b(.*?)\\b\"|\"\")\\B", "\\\\q{$2}")

See the regex demo.

Details

  • (?s) - an embedded flag option (equal to Pattern.DOTALL) that makes . match line break chars, too
  • \B - a non-word boundary, here, it means that immediately to the left, there must be a non-word char or start of string (because after \B, there is a non-word char, ")
  • ( - start of the first capturing group:
    • "\b(.*?)\b" - " followed with a word char, then Group 2 capturing any zero or more chars, as few as possible, and then a " that is preceded with a word char (that is why this pattern can't match "", since after the first and before the second, there must be a letter, digit or _)
    • | - or
    • "" - a "" substring
  • ) - end of the first capturing group
  • \B - a non-word boundary, here, it means that immediately to the right, there must be a non-word char or end of string (because before \B, there is a non-word char, ").
like image 191
Wiktor Stribiżew Avatar answered Feb 23 '26 13:02

Wiktor Stribiżew



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!