Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can regex ignore escaped-quotes when matching strings?

I'm trying to write a regex that will match everything BUT an apostrophe that has not been escaped. Consider the following:

<?php $s = 'Hi everyone, we\'re ready now.'; ?>

My goal is to write a regular expression that will essentially match the string portion of that. I'm thinking of something such as

/.*'([^']).*/

in order to match a simple string, but I've been trying to figure out how to get a negative lookbehind working on that apostrophe to ensure that it is not preceded by a backslash...

Any ideas?

- JMT

like image 990
JMTyler Avatar asked Jun 29 '09 21:06

JMTyler


1 Answers

Here's my solution with test cases:

/.*?'((?:\\\\|\\'|[^'])*+)'/

And my (Perl, but I don't use any Perl-specific features I don't think) proof:

use strict;
use warnings;

my %tests = ();
$tests{'Case 1'} = <<'EOF';
$var = 'My string';
EOF

$tests{'Case 2'} = <<'EOF';
$var = 'My string has it\'s challenges';
EOF

$tests{'Case 3'} = <<'EOF';
$var = 'My string ends with a backslash\\';
EOF

foreach my $key (sort (keys %tests)) {
    print "$key...\n";
    if ($tests{$key} =~ m/.*?'((?:\\\\|\\'|[^'])*+)'/) {
        print " ... '$1'\n";
    } else {
        print " ... NO MATCH\n";
    }
}

Running this shows:

$ perl a.pl
Case 1...
 ... 'My string'
Case 2...
 ... 'My string has it\'s challenges'
Case 3...
 ... 'My string ends with a backslash\\'

Note that the initial wildcard at the start needs to be non-greedy. Then I use non-backtracking matches to gobble up \\ and \' and then anything else that is not a standalone quote character.

I think this one probably mimics the compiler's built-in approach, which should make it pretty bullet-proof.

like image 112
the.jxc Avatar answered Oct 02 '22 04:10

the.jxc