I have xml like this:
<formula type="inline">
    <default:math xmlns="http://www.w3.org/1998/Math/MathML">
        <default:mi>
            ℤ
        </default:mi>
    </default:math>
</formula>
My goal is to get rid of all special entities like ℤ by replacing them by their numeric entity presentations.
I tried :
$test    = <content of the xml>;
$convmap = array(0x80, 0xffff, 0, 0xffff);
$test    = mb_encode_numericentity($test, $convmap, 'UTF-8');
But this will not replace the ℤ Any idea?
My goal is to get:
ℤ 
as shown here: http://www.fileformat.info/info/unicode/char/2124/index.htm
Thank you.
Your converter is converting your LaTeX into MathML, not HTML entities. You need something that converts directly into HTML character references, or a MathML to HTML character reference converter.
You should be able to use htmlentities:
htmlentities($symbolsToEncode, ENT_XML1, 'UTF-8');
http://pt1.php.net/htmlentities
You can change ENT_XML1 to ENT_SUBSTITUTE and it will return Unicode Replacement Characters or Hex character references.
As an alternative, you could use strtr to convert the characters to something you specify:
$chars = array(
    "\x8484" => "蒄"
    ...
);
$convertedXML = strtr($xml, $chars);
http://php.net/strtr
Someone has done something similar on GitHub.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With