Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to match something with regex that is not between two special characters?

Tags:

regex

ruby

I have a string like this:

a b c a b " a b " b a " a "

How do I match every a that is not part of a string delimited by "? I want to match everything that is bold here:

a bc a b " ab " b a " a "

I want to replace those matches (or rather remove them by replacing them with an empty string), so removing the quoted parts for matching won't work, because I want those to remain in the string. I'm using Ruby.

like image 264
js-coder Avatar asked Jul 16 '12 10:07

js-coder


People also ask

How do I match a character except space in regex?

You can match a space character with just the space character; [^ ] matches anything but a space character.

What does '$' mean in regex?

$ means "Match the end of the string" (the position after the last character in the string).

What regular expression would you use to match a single character?

Use square brackets [] to match any characters in a set. Use \w to match any single alphanumeric character: 0-9 , a-z , A-Z , and _ (underscore).


2 Answers

Assuming the quotes are correctly balanced and there are no escaped quotes, then it's easy:

result = subject.gsub(/a(?=(?:[^"]*"[^"]*")*[^"]*\Z)/, '')

This replaces all the as with the empty string if and only if there is an even number of quotes ahead of the matched a.

Explanation:

a        # Match a
(?=      # only if it's followed by...
 (?:     # ...the following:
  [^"]*" #  any number of non-quotes, followed by one quote
  [^"]*" #  the same again, ensuring an even number
 )*      # any number of times (0, 2, 4 etc. quotes)
 [^"]*   # followed by only non-quotes until
 \Z      # the end of the string.
)        # End of lookahead assertion

If you can have escaped quotes within quotes (a "length: 2\""), it's still possible but will be more complicated:

result = subject.gsub(/a(?=(?:(?:\\.|[^"\\])*"(?:\\.|[^"\\])*")*(?:\\.|[^"\\])*\Z)/, '')

This is in essence the same regex as above, only substituting (?:\\.|[^"\\]) for [^"]:

(?:     # Match either...
 \\.    # an escaped character
|       # or
 [^"\\] # any character except backslash or quote
)       # End of alternation
like image 146
Tim Pietzcker Avatar answered Oct 17 '22 20:10

Tim Pietzcker


js-coder, resurrecting this ancient question because it had a simple solution that wasn't mentioned. (Found your question while doing some research for a regex bounty quest.)

As you can see the regex is really tiny compared with the regex in the accepted answer: ("[^"]*")|a

subject = 'a b c a b " a b " b a " a "'
regex = /("[^"]*")|a/
replaced = subject.gsub(regex) {|m|$1}
puts replaced

See this live demo

Reference

How to match pattern except in situations s1, s2, s3

How to match a pattern unless...

like image 33
zx81 Avatar answered Oct 17 '22 21:10

zx81