I need to replace all & with with &
in a string like this:
Übung 1: Ü & Ä
or in html
Übung 1: Ü & Ä
Like you see htmlspecialchars in the string (but the &
is not displayed as &
), so I need to exclude them from my replace. I'm not so familiar with regular expressions. All I need is an expression that does the following:
Search for &
that does either follow a (space) or does not follow some chars, excluding a space, which are ending with a
;
. then replace that with &
.
I tried something like this:
<cfset data = ReReplace(data, "&[ ]|[^(?*^( ));]", "&", "ALL") />
but that replaces every char with the $amp;... ^^'
Sorry, I really don't get that regex things.
I think it would be easier to simply replace all occurrences of &
with &
, and then replace the wrongly replaced ones again:
<cfset data = ReReplace(ReReplace(data, "&", "&", "ALL"), "&([^;&]*;)", "&\1", "ALL") />
I haven't tested this in ColdFusion (since I have no clue how to), but it should work, because in JavaScript, the regex itself works:
var s = "I we&nt out on 1 se&123;p 2012 and& it was be&tter & than 15 jan 2012"
console.log(s.replace(/&/g, '&').replace(/&([^;&]*;)/g, '&$1'));
//"I we&nt out on 1 se&123;p 2012 and& it was be&tter & than 15 jan 2012"
So I assume the regex will also do its trick in CF.
The reason your attempted pattern &[ ]|[^(?*^( ));]
is failing is primarily because you have a |
but no bounding container - this means you are replacing &[ ]
OR [^(?*^( ));]
- and that latter will match most things - you are also misunderstanding how character classes work.
Inside [
..]
(a character class) there are a few simple rules:
^
it is negated, otherwise the ^
is literal.\w
), or escapes the following character (inside a char class this is only required for [
]
^
-
\
).Also, you don't need to put a space inside a character class - a literal space works fine (unless you are in free-spacing comment mode, which needs to be explicitly enabled).
Hopefully that helps you understand what was going wrong?
As for actually solving your problem...
To match an ampersand that does not start a HTML entity, you can use:
&(?![a-z][a-z0-9]+;|#(?:\d+|x[\dA-F]+);)
That is, an ampersand, followed by a negative lookahead for either of:
a letter, then a letter or a number, the a semicolon - i.e. a named entity reference
a hash, then either a number, or an x followed by a hex number, and finally a semicolon - i.e. a numeric entity reference.
To use this in CFML, to replace &
with &
would be:
<cfset data = rereplaceNoCase( data , '&(?![a-z][a-z0-9]+;|##(?:\d+|x[\dA-F]+);)' , '&' , 'all' ) />
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With