How to decode numeric HTML entities in PHP

I'm trying to decode encoded long dash from numeric entity to string, but it seems that I can't find a function which can do this properly.

The best that I found is mb_decode_numericentity(), however, for some reason it fails to decode long dash and some other special characters.

$str = '–';

$str = mb_decode_numericentity($str, array(0xFF, 0x2FFFF, 0, 0xFFFF), 'ISO-8859-1');

This will return "?".

Anyone knows how to solve this problem?

2 Answers

The following code snippet (mostly stolen from here and improved) will work for literal, numeric decimal, and numeric hexa-decimal entities:

header("content-type: text/html; charset=utf-8");

* Decodes all HTML entities, including numeric and hexadecimal ones.
* @param mixed $string
* @return string decoded HTML

function html_entity_decode_numeric($string, $quote_style = ENT_COMPAT, $charset = "utf-8")
$string = html_entity_decode($string, $quote_style, $charset);
$string = preg_replace_callback('~&#x([0-9a-fA-F]+);~i', "chr_utf8_callback", $string);
$string = preg_replace('~&#([0-9]+);~e', 'chr_utf8("\\1")', $string);
return $string; 

 * Callback helper 

function chr_utf8_callback($matches)
  return chr_utf8(hexdec($matches[1])); 

* Multi-byte chr(): Will turn a numeric argument into a UTF-8 string.
* @param mixed $num
* @return string

function chr_utf8($num)
if ($num < 128) return chr($num);
if ($num < 2048) return chr(($num >> 6) + 192) . chr(($num & 63) + 128);
if ($num < 65536) return chr(($num >> 12) + 224) . chr((($num >> 6) & 63) + 128) . chr(($num & 63) + 128);
if ($num < 2097152) return chr(($num >> 18) + 240) . chr((($num >> 12) & 63) + 128) . chr((($num >> 6) & 63) + 128) . chr(($num & 63) + 128);
return '';

$string ="&#x201D;"; 

echo html_entity_decode_numeric($string);

Improvement suggestions are welcome.

mb_decode_numericentity does not handle hexadecimal, only decimal. Do you get the expected result with:

$str = '–';

$str = mb_decode_numericentity ( $str , Array(255, 3145727, 0, 65535) , 'ISO-8859-1');

You can use hexdec to convert your hexadecimal to decimal.

Also, out of curiosity, does the following work:

$str = '&#8211;';

 $str = html_entity_decode($str);
