Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex to match double quoted strings without variables inside php tags

Basically I need a regex expression to match all double quoted strings inside PHP tags without a variable inside.

Here's what I have so far:

"([^\$\n\r]*?)"(?![\w ]*')

and replace with:

'$1'

However, this would match things outside PHP tags as well, e.g HTML attributes.

Example case:

<a href="somelink" attribute="value">Here's my "dog's website"</a>
<?php
    $somevar = "someval";
    $somevar2 = "someval's got a quote inside";
?>
<?php
    $somevar3 = "someval with a $var inside";
    $somevar4 = "someval " . $var . 'with concatenated' . $variables . "inside";
    $somevar5 = "this php tag doesn't close, as it's the end of the file...";

it should match and replace all places where the " should be replaced with a ', this means that html attributes should ideally be left alone.

Example output after replace:

<a href="somelink" attribute="value">Here's my "dog's website"</a>
<?php
    $somevar = 'someval';
    $somevar2 = 'someval\'s got a quote inside';
?>
<?php
    $somevar3 = "someval with a $var inside";
    $somevar4 = 'someval ' . $var . 'with concatenated' . $variables . 'inside';
    $somevar5 = 'this php tag doesn\'t close, as it\'s the end of the file...';

It would also be great to be able to match inside script tags too...but that might be pushing it for one regex replace.

I need a regex approach, not a PHP approach. Let's say I'm using regex-replace in a text editor or JavaScript to clean up the PHP source code.

like image 474
Harry Mustoe-Playfair Avatar asked Jul 11 '13 08:07

Harry Mustoe-Playfair


1 Answers

tl;dr

This is really too complex complex to be done with regex. Especially not a simple regex. You might have better luck with nested regex, but you really need to lex/parse to find your strings, and then you could operate on them with a regex.

Explanation

You can probably manage to do this. You can probably even manage to do this well, maybe even perfectly. But it's not going to be easy. It's going to be very very difficult.

Consider this:

Welcome to my php file. We're not "in" yet.

<?php
  /* Ok. now we're "in" php. */

  echo "this is \"stringa\"";
  $string = 'this is \"stringb\"';
  echo "$string";
  echo "\$string";

  echo "this is still ?> php.";

  /* This is also still ?> php. */

?> We're back <?="out"?> of php. <?php

  // Here we are again, "in" php.

  echo <<<STRING
    How do "you" want to \""deal"\" with this STRING;
STRING;

  echo <<<'STRING'
    Apparently this is \\"Nowdoc\\". I've never used it.
STRING;

  echo "And what about \\" . "this? Was that a tricky '\"' to catch?";

  // etc...

Forget matching variable names in double quoted strings. Can you just match all of the string in this example? It looks like a nightmare to me. SO's syntax highlighting certainly won't know what to do with it.

Did you consider that variables may appear in heredoc strings as well?

I don't want to think about the regex to check if:

  1. Inside <?php or <?= code
  2. Not in a comment
  3. Inside a quoted quote
  4. What type of quoted quote?
  5. Is it a quote of that type?
  6. Is it preceded by \ (escaped)?
  7. Is the \ escaped??
  8. etc...

Summary

You can probably write a regex for this. You can probably manage with some backreferences and lots of time and care. It's going to be hard and your probably going to waste a lot of time, and if you ever need to fix it, you aren't going to understand the regex you wrote.

See also

This answer. It's worth it.

like image 166
Jon Surrell Avatar answered Oct 01 '22 22:10

Jon Surrell