For the life of me I can't figure this one out.
I need to search the following text, matching only the quotes in bold:
Don't match: """This is a python docstring"""
Match: " This is a regular string "
Match: "" ← That is an empty string
How can I do this with a regular expression?
Here's what I've tried:
Doesn't work:
(?!"")"(?<!"")
Close, but doesn't match double quotes.
Doesn't work:
"(?<!""")|(?!"")"(?<!"")|(?!""")"
I naively thought that I could add the alternates that I don't want but the logic ends up reversed. This one matches everything because all quotes match at least one of the alternates.
(Please note: I'm not running the code, so solutions around using __doc__
won't help, I'm just trying to find and replace in my code editor.)
Save this question. . means match any character in regular expressions. * means zero or more occurrences of the SINGLE regex preceding it. My alphabet.txt contains a line abcdefghijklmnopqrstuvwxyz.
Double quotes around a string are used to specify a regular expression search (compatible with Perl 5.005, using the Perl-compatible regular expressions library written by Philip Hazel). If you don't know how to use them, try consulting the man pages for ed, egrep, vi, or regex.
To match a character having special meaning in regex, you need to use a escape sequence prefix with a backslash ( \ ). E.g., \. matches "." ; regex \+ matches "+" ; and regex \( matches "(" . You also need to use regex \\ to match "\" (back-slash).
You can use /(?<!")"{1,2}(?!")/
DEMO
Autopsy:
(?<!")
a negative look-behind for the literal "
. The match cannot have this character in front"{1,2}
the literal "
matched once or twice(?!")
a negative look-ahead for the literal "
. The match cannot have this character afterYour first try might've failed because (?!")
is a negative look-ahead, and (?<!")
is a negative look-behind. It makes no sense to have look-aheads before your match, or look-behinds after your match.
I realized that my original problem description was actually slightly wrong. That is, I need to actually only match a single quote character, unless if it's part of a group of 3 quote characters.
The difference is that this is desirable for editing so that I can find and replace with '
. If I match "one or two quotes" then I can't automatically replace with a single character.
I came up with this modification to h20000000's answer that satisfies that case:
(?<!"")(?<=(?!""").)"(?!"")
In the demo, you can see that the ""
are matched individually, instead of as a group.
This works very similarly to the other answer, except:
"
that leaves us with matching everything we want except it still matches the middle quotes of a """
:
Finally, adding the (?<=(?!""").)
excludes that case specifically, by saying "look back one character, then fail the match if the next three characters are """
):
I decided not to change the question because I don't want to hijack the answer, but I think this can be a useful addition.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With