In PHP, I want to encode ampersands that have not already been encoded. I came up with this regex
/&(?=[^a])/
It seems to work good so far, but seeing as how I'm not much of a regex expert, I am asking if any potential pitfalls can be seen in this regex?
Essentially it needs to convert &
to &
but leave the &
in &
as is (so as not to get &
)
Thanks
Thanks for the answers. It seems I wasn't thinking broadly enough to cover all bases. This seems like a common pitfall of regexs themselves (having to think of all possibilities which may make your regex get false positives). It sure does beat my original one str_replace(' & ', ' & ', $string);
:)
Even better would be negative lookahead assertion to verify & isn't followed by amp;
/&(?!amp;)/
Though that will change any ampersands used for other entities. If you're likely to have others, then how about something like
/&(?!#?[a-zA-Z0-9]+;)/
This will look for an ampersand, but asserting that it is NOT followed by an optional hash symbol (for numeric entities), a series of alphanumerics and a semicolon, which should cover named and numeric entities like "e;
or ª
$text="It’s 30 ° outside & very hot. T-shirt & shorts needed!";
$text=preg_replace('/&(?!#?[a-z0-9]+;)/', '&', $text);
echo "$text\n";
Which will output
It’s 30 ° outside & very hot. T-shirt & shorts needed!
which is more easily read as "It’s 30 ° outside & very hot. T-shirt & shorts needed!"
As Ionut G. Stan points out below, from PHP 5.2.3 you can use htmlspecialchars with a fourth parameter of false to prevent double-encoding, e.g.
$text=htmlspecialchars($text,ENT_COMPAT,"UTF-8",false);
It will apply it for any other encoded char.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With