I'm trying to decode encoded long dash from numeric entity to string, but it seems that I can't find a function which can do this properly.
The best that I found is mb_decode_numericentity(), however, for some reason it fails to decode long dash and some other special characters.
$str = '–';
$str = mb_decode_numericentity($str, array(0xFF, 0x2FFFF, 0, 0xFFFF), 'ISO-8859-1');
This will return "?".
Anyone knows how to solve this problem?
Difference between htmlentities() and htmlspecialchars() function: The only difference between these function is that htmlspecialchars() function convert the special characters to HTML entities whereas htmlentities() function convert all applicable characters to HTML entities.
HTML encoding converts characters that are not allowed in HTML into character-entity equivalents; HTML decoding reverses the encoding. For example, when embedded in a block of text, the characters < and > are encoded as < and > for HTTP transmission.
Definition and Usage. The htmlentities() function converts characters to HTML entities. Tip: To convert HTML entities back to characters, use the html_entity_decode() function. Tip: Use the get_html_translation_table() function to return the translation table used by htmlentities().
Definition and Usage The htmlspecialchars() function converts some predefined characters to HTML entities.
The following code snippet (mostly stolen from here and improved) will work for literal, numeric decimal, and numeric hexa-decimal entities:
header("content-type: text/html; charset=utf-8");
/**
* Decodes all HTML entities, including numeric and hexadecimal ones.
*
* @param mixed $string
* @return string decoded HTML
*/
function html_entity_decode_numeric($string, $quote_style = ENT_COMPAT, $charset = "utf-8")
{
$string = html_entity_decode($string, $quote_style, $charset);
$string = preg_replace_callback('~&#x([0-9a-fA-F]+);~i', "chr_utf8_callback", $string);
$string = preg_replace('~&#([0-9]+);~e', 'chr_utf8("\\1")', $string);
return $string;
}
/**
* Callback helper
*/
function chr_utf8_callback($matches)
{
return chr_utf8(hexdec($matches[1]));
}
/**
* Multi-byte chr(): Will turn a numeric argument into a UTF-8 string.
*
* @param mixed $num
* @return string
*/
function chr_utf8($num)
{
if ($num < 128) return chr($num);
if ($num < 2048) return chr(($num >> 6) + 192) . chr(($num & 63) + 128);
if ($num < 65536) return chr(($num >> 12) + 224) . chr((($num >> 6) & 63) + 128) . chr(($num & 63) + 128);
if ($num < 2097152) return chr(($num >> 18) + 240) . chr((($num >> 12) & 63) + 128) . chr((($num >> 6) & 63) + 128) . chr(($num & 63) + 128);
return '';
}
$string ="”";
echo html_entity_decode_numeric($string);
Improvement suggestions are welcome.
mb_decode_numericentity
does not handle hexadecimal, only decimal. Do you get the expected result with:
$str = '–';
$str = mb_decode_numericentity ( $str , Array(255, 3145727, 0, 65535) , 'ISO-8859-1');
You can use hexdec
to convert your hexadecimal to decimal.
Also, out of curiosity, does the following work:
$str = '–';
$str = html_entity_decode($str);
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With