Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

regex (in PHP) to match & that aren't HTML entities

Tags:

regex

php

pcre

Here's the goal: to replace all standalone ampersands with & but NOT replace those that are already part of an HTML entity such as  .

I think I need a regular expression for PHP (preferably for preg_ functions) that will match only standalone ampersands. I just don't know how to do that with preg_replace.

like image 847
Doug Kaye Avatar asked Nov 30 '22 07:11

Doug Kaye


2 Answers

PHP's htmlentities() has double_encode argument for this.

If you want to do things like that in regular expressions, then negative assertions come useful:

preg_replace('/&(?!(?:[[:alpha:]][[:alnum:]]*|#(?:[[:digit:]]+|[Xx][[:xdigit:]]+));)/', '&', $txt);
like image 50
Kornel Avatar answered Dec 06 '22 09:12

Kornel


You could always run html_entity_decode before you run htmlentities? Works unless you only want to do ampersands (and even then you can play with the charset parameters).

Much easier and faster than a regex.

like image 26
Ross Avatar answered Dec 06 '22 09:12

Ross