I have a string like this:
a b c a b " a b " b a " a "
How do I match every a
that is not part of a string delimited by "
? I want to match everything that is bold here:
a bc a b " ab " b a " a "
I want to replace those matches (or rather remove them by replacing them with an empty string), so removing the quoted parts for matching won't work, because I want those to remain in the string. I'm using Ruby.
You can match a space character with just the space character; [^ ] matches anything but a space character.
$ means "Match the end of the string" (the position after the last character in the string).
Use square brackets [] to match any characters in a set. Use \w to match any single alphanumeric character: 0-9 , a-z , A-Z , and _ (underscore).
Assuming the quotes are correctly balanced and there are no escaped quotes, then it's easy:
result = subject.gsub(/a(?=(?:[^"]*"[^"]*")*[^"]*\Z)/, '')
This replaces all the a
s with the empty string if and only if there is an even number of quotes ahead of the matched a
.
Explanation:
a # Match a
(?= # only if it's followed by...
(?: # ...the following:
[^"]*" # any number of non-quotes, followed by one quote
[^"]*" # the same again, ensuring an even number
)* # any number of times (0, 2, 4 etc. quotes)
[^"]* # followed by only non-quotes until
\Z # the end of the string.
) # End of lookahead assertion
If you can have escaped quotes within quotes (a "length: 2\""
), it's still possible but will be more complicated:
result = subject.gsub(/a(?=(?:(?:\\.|[^"\\])*"(?:\\.|[^"\\])*")*(?:\\.|[^"\\])*\Z)/, '')
This is in essence the same regex as above, only substituting (?:\\.|[^"\\])
for [^"]
:
(?: # Match either...
\\. # an escaped character
| # or
[^"\\] # any character except backslash or quote
) # End of alternation
js-coder, resurrecting this ancient question because it had a simple solution that wasn't mentioned. (Found your question while doing some research for a regex bounty quest.)
As you can see the regex is really tiny compared with the regex in the accepted answer: ("[^"]*")|a
subject = 'a b c a b " a b " b a " a "'
regex = /("[^"]*")|a/
replaced = subject.gsub(regex) {|m|$1}
puts replaced
See this live demo
Reference
How to match pattern except in situations s1, s2, s3
How to match a pattern unless...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With