Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

variable length masking with preg_replace

I am masking all characters between single quotes (inclusively) within a string using preg_replace_callback(). But I would like to only use preg_replace() if possible, but haven't been able to figure it out. Any help would be appreciated.

This is what I have using preg_replace_callback() which produces the correct output:

function maskCallback( $matches ) {
    return str_repeat( '-', strlen( $matches[0] ) );
}
function maskString( $str ) {
    return preg_replace_callback( "('.*?')", 'maskCallback', $str );
}

$str = "TEST 'replace''me' ok 'me too'";
echo $str,"\n";
echo $maskString( $str ),"\n";

Output is:

TEST 'replace''me' ok 'me too'
TEST ------------- ok --------

I have tried using:

preg_replace( "/('.*?')/", '-', $str );

but the dashes get consumed, e.g.:

TEST -- ok -

Everything else I have tried doesn't work either. (I'm obviously not a regex expert.) Is this possible to do? If so, how?

like image 857
Alan Avatar asked Jan 29 '14 18:01

Alan


1 Answers

Yes you can do it, (assuming that quotes are balanced) example:

$str = "TEST 'replace''me' ok 'me too'";
$pattern = "~[^'](?=[^']*(?:'[^']*'[^']*)*+'[^']*\z)|'~";    
$result = preg_replace($pattern, '-', $str);

The idea is: you can replace a character if it is a quote or if it is followed by an odd number of quotes.

Without quotes:

$pattern = "~(?:(?!\A)\G|(?:(?!\G)|\A)'\K)[^']~";
$result = preg_replace($pattern, '-', $str);

The pattern will match a character only when it is contiguous to a precedent match (In other words, when it is immediately after the last match) or when it is preceded by a quote that is not contiguous to the precedent match.

\G is the position after the last match (at the beginning it is the start of the string)

pattern details:

~             # pattern delimiter

(?: # non capturing group: describe the two possibilities
    # before the target character

    (?!\A)\G  # at the position in the string after the last match
              # the negative lookbehind ensure that this is not the start
              # of the string

  |           # OR

    (?:       # (to ensure that the quote is a not a closing quote)
        (?!\G)   # not contiguous to a precedent match
      |          # OR
        \A       # at the start of the string
    )
    '         # the opening quote

    \K        # remove all precedent characters from the match result
              # (only one quote here)
)

[^']          # a character that is not a quote

~

Note that since the closing quote is not matched by the pattern, the following characters that are not quotes can't be matched because there is no precedent match.

EDIT:

The (*SKIP)(*FAIL) way:

Instead of testing if a single quote is not a closing quote with (?:(?!\G)|\A)' like in the precedent pattern, you can break the match contiguity on closing quotes using the backtracking control verbs (*SKIP) and (*FAIL) (That can be shorten to (*F)).

$pattern = "~(?:(?!\A)\G|')(?:'(*SKIP)(*F)|\K[^'])~";
$result = preg_replace($pattern, '-', $str);

Since the pattern fails on each closing quotes, the following characters will not be matched until the next opening quote.

The pattern may be more efficient written like this:

$pattern = "~(?:\G(?!\A)(?:'(*SKIP)(*F))?|'\K)[^']~";

(You can also use (*PRUNE) in place of (*SKIP).)

like image 120
Casimir et Hippolyte Avatar answered Oct 08 '22 01:10

Casimir et Hippolyte