Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex to match one or two quotes but not three in a row

For the life of me I can't figure this one out.

I need to search the following text, matching only the quotes in bold:

Don't match: """This is a python docstring"""

Match: " This is a regular string "

Match: "" ← That is an empty string

How can I do this with a regular expression?

Here's what I've tried:

Doesn't work:

(?!"")"(?<!"")

Close, but doesn't match double quotes.

Doesn't work:

"(?<!""")|(?!"")"(?<!"")|(?!""")"

I naively thought that I could add the alternates that I don't want but the logic ends up reversed. This one matches everything because all quotes match at least one of the alternates.

(Please note: I'm not running the code, so solutions around using __doc__ won't help, I'm just trying to find and replace in my code editor.)

like image 455
Nicole Avatar asked Dec 17 '13 19:12

Nicole


People also ask

What does ?= * Mean in regex?

Save this question. . means match any character in regular expressions. * means zero or more occurrences of the SINGLE regex preceding it. My alphabet.txt contains a line abcdefghijklmnopqrstuvwxyz.

What is double quotes in regex?

Double quotes around a string are used to specify a regular expression search (compatible with Perl 5.005, using the Perl-compatible regular expressions library written by Philip Hazel). If you don't know how to use them, try consulting the man pages for ed, egrep, vi, or regex.

How do you match expressions in regex?

To match a character having special meaning in regex, you need to use a escape sequence prefix with a backslash ( \ ). E.g., \. matches "." ; regex \+ matches "+" ; and regex \( matches "(" . You also need to use regex \\ to match "\" (back-slash).


2 Answers

You can use /(?<!")"{1,2}(?!")/

DEMO

Autopsy:

  • (?<!") a negative look-behind for the literal ". The match cannot have this character in front
  • "{1,2} the literal " matched once or twice
  • (?!") a negative look-ahead for the literal ". The match cannot have this character after

Your first try might've failed because (?!") is a negative look-ahead, and (?<!") is a negative look-behind. It makes no sense to have look-aheads before your match, or look-behinds after your match.

Regular expression visualization

like image 136
h2ooooooo Avatar answered Dec 15 '22 08:12

h2ooooooo


I realized that my original problem description was actually slightly wrong. That is, I need to actually only match a single quote character, unless if it's part of a group of 3 quote characters.

The difference is that this is desirable for editing so that I can find and replace with '. If I match "one or two quotes" then I can't automatically replace with a single character.

I came up with this modification to h20000000's answer that satisfies that case:

(?<!"")(?<=(?!""").)"(?!"")

Regular expression visualization

In the demo, you can see that the "" are matched individually, instead of as a group.

This works very similarly to the other answer, except:

  • it only matches a single "
  • that leaves us with matching everything we want except it still matches the middle quotes of a """:

    enter image description here

  • Finally, adding the (?<=(?!""").) excludes that case specifically, by saying "look back one character, then fail the match if the next three characters are """):

    enter image description here


I decided not to change the question because I don't want to hijack the answer, but I think this can be a useful addition.

like image 38
Nicole Avatar answered Dec 15 '22 09:12

Nicole