Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Any pitfalls with this regex that matches ampersands not already encoded

Tags:

regex

php

In PHP, I want to encode ampersands that have not already been encoded. I came up with this regex

/&(?=[^a])/

It seems to work good so far, but seeing as how I'm not much of a regex expert, I am asking if any potential pitfalls can be seen in this regex?

Essentially it needs to convert & to & but leave the & in & as is (so as not to get &)

Thanks

Update

Thanks for the answers. It seems I wasn't thinking broadly enough to cover all bases. This seems like a common pitfall of regexs themselves (having to think of all possibilities which may make your regex get false positives). It sure does beat my original one str_replace(' & ', ' & ', $string); :)

like image 427
alex Avatar asked Nov 28 '22 20:11

alex


2 Answers

Even better would be negative lookahead assertion to verify & isn't followed by amp;

/&(?!amp;)/

Though that will change any ampersands used for other entities. If you're likely to have others, then how about something like

/&(?!#?[a-zA-Z0-9]+;)/

This will look for an ampersand, but asserting that it is NOT followed by an optional hash symbol (for numeric entities), a series of alphanumerics and a semicolon, which should cover named and numeric entities like &quote; or ª

Test code

$text="It’s 30 ° outside & very hot. T-shirt & shorts needed!";

$text=preg_replace('/&(?!#?[a-z0-9]+;)/', '&', $text);

echo "$text\n";

Which will output

It’s 30 ° outside & very hot. T-shirt & shorts needed!

which is more easily read as "It’s 30 ° outside & very hot. T-shirt & shorts needed!"

Alternative for PHP 5.2.3+

As Ionut G. Stan points out below, from PHP 5.2.3 you can use htmlspecialchars with a fourth parameter of false to prevent double-encoding, e.g.

$text=htmlspecialchars($text,ENT_COMPAT,"UTF-8",false);
like image 179
Paul Dixon Avatar answered Mar 15 '23 23:03

Paul Dixon


It will apply it for any other encoded char.

like image 24
eglasius Avatar answered Mar 15 '23 23:03

eglasius