Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

preg_replace when not inside double quotes

Basically I want to replace certain words (e.g. the word "tree" with the word "pizza") in sentences. Restriction: When the word that should be replaced is between double quotes, the replace should not be performed.

Example:

The tree is green. -> REPLACE tree WITH pizza
"The" tree is "green". -> REPLACE tree WITH pizza
"The tree" is green. -> DONT REPLACE
"The tree is" green. -> DONT REPLACE
The ""tree is green. -> REPLACE tree WITH pizza

Is it possible to do this with regular expressions? I would count the number of double quotes before the word and check if it is odd or even. But is this possible using preg_replace in php?

Thanks!

//EDIT:

At the moment my code looks like the following:

preg_replace("/tree/", "pizza", $sentence)

But the problem here is to implement the logic with the double quotes. I tried things like:

preg_replace("/[^"]tree/", "pizza", $sentence)

But this does not work, because it checks only if a double quote is in front of the word. But there are examples above where this check fails. Import is that I want to solve that problem with regex only.

like image 623
priojewo Avatar asked Dec 24 '13 21:12

priojewo


People also ask

How do you escape a double quoted string?

String literalsDouble quote characters (") are escaped by a backslash (\)." Enclose the string in single-quotes ( ' ): 'Another string literal. Single quote characters (') require escaping by a backslash (\).

What is the difference between Str_replace and Preg_replace?

str_replace replaces a specific occurrence of a string, for instance "foo" will only match and replace that: "foo". preg_replace will do regular expression matching, for instance "/f. {2}/" will match and replace "foo", but also "fey", "fir", "fox", "f12", etc.

How do you know if a string has double quotes?

To check if the string has double quotes you can use: text_line. Contains("\""); Here \" will escape the double-quote.

Is double quote a special character in regex?

Firstly, double quote character is nothing special in regex - it's just another character, so it doesn't need escaping from the perspective of regex. However, because Java uses double quotes to delimit String constants, if you want to create a string in Java with a double quote in it, you must escape them.


2 Answers

Regular expression is not a tool that will do what you need for every job. You can use regular expression for this to a certain extent, but for all cases amongst nested quotes, it continues to get more complicated.

You could use a Negative Lookahead here.

$text = preg_replace('/\btree\b(?![^"]*"(?:(?:[^"]*"){2})*[^"]*$)/i', 'pizza', $text);

See Working demo

Regular expression:

\b               the boundary between a word char (\w) and not a word char
 tree            'tree'
\b               the boundary between a word char (\w) and not a word char
(?!              look ahead to see if there is not:
 [^"]*           any character except: '"' (0 or more times)
  "              '"'
 (?:             group, but do not capture (0 or more times)
  (?:            group, but do not capture (2 times):
   [^"]*         any character except: '"' (0 or more times)
    "            '"'
  ){2}           end of grouping
 )*              end of grouping
 [^"]*           any character except: '"' (0 or more times)
 $               before an optional \n, and the end of the string
)                end of look-ahead

Another option is to use controlled backtracking since your able to do this in php

$text = preg_replace('/"[^"]*"(*SKIP)(*FAIL)|\btree\b/i', 'pizza', $text);

See Working demo

The idea is to skip content in quotations. I first match the quotation followed by any character except " followed by a quotation and then make the subpattern fail and force the regular expression engine to not retry the substring with an other alternative with (*SKIP) and (*FAIL) backtracking control verbs.

like image 162
hwnd Avatar answered Sep 19 '22 01:09

hwnd


There is a handy trick using some hidden regex powers :

~".*?"(*SKIP)(*FAIL)|\btree\b~s

Explanation:

~                   # start delimiter (we could have used /, #, @ etc...)
"                   # match a double quote
.*?                 # match anything ungreedy until ...
"                   # match a double quote
(*SKIP)(*FAIL)      # make it fail
|                   # or
\btree\b            # match a tree with wordboundaries
~                   # end delimiter
s                   # setting the s modifier to match newlines with dots .

In actual PHP code, you would want to use preg_quote() to escape regex characters. Here's a little snippet:

$search = 'tree';
$replace = 'plant';
$input = 'The tree is green.
"The" tree is "green".
"The tree" is green.
"The tree is" green.
The ""tree is green.';

$regex = '~".*?"(*SKIP)(*FAIL)|\b' . preg_quote($search, '~') . '\b~s';
$output = preg_replace($regex, $replace, $input);
echo $output;

Online regex demo     Online PHP demo

like image 44
HamZa Avatar answered Sep 18 '22 01:09

HamZa