I want only the unencoded characters to get converted to html entities, without affecting the entities which are already present. I have a string that has previously encoded entities, e.g.:
gaIUSHIUGhj>‐ hjb×jkn.jhuh>hh> …
When I use htmlentities()
, the &
at the beginning of entities gets encoded again. This means ‐
and other entities have their &
encoded to &
:
×
I tried decoding the complete string, then encoding it again, but it does not seem to work properly. This is the code I tried:
header('Content-Type: text/html; charset=iso-8859-1');
...
$b = 'gaIUSHIUGhj>‐ hjb×jkn.jhuh>hh> …';
$b = html_entity_decode($b, ENT_QUOTES, 'UTF-8');
$b = iconv("UTF-8", "ISO-8859-1//TRANSLIT", $b);
$b = htmlentities($b, ENT_QUOTES, 'UTF-8');
But it does not seem to work the right way. Is there a way to prevent or stop this from happening?
The htmlentities() function converts characters to HTML entities. Tip: To convert HTML entities back to characters, use the html_entity_decode() function. Tip: Use the get_html_translation_table() function to return the translation table used by htmlentities().
htmlentities — Convert all applicable characters to HTML entities. htmlspecialchars — Convert special characters to HTML entities.
Double encoding is the act of encoding data twice in a row using the same encoding scheme. It is usually used as an attack technique to bypass authorization schemes or security filters that intercept user input.
Entity-quoting only HTML syntax characters The following entities are converted: Ampersands ( & ) are converted to & Double quotes ( " ) are converted to " Single quotes ( ' ) are converted to ' (if ENT_QUOTES is on, as described for htmlentities( ) )
Set the optional $double_encode
variable to false
. See the documentation for more information.
Your resulting code should look like:
$b = htmlentities($b, ENT_QUOTES, 'UTF-8', false);
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With