Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PHP How to encode text to numeric entity?

I have xml like this:

<formula type="inline">
    <default:math xmlns="http://www.w3.org/1998/Math/MathML">
        <default:mi>
            &Zopf;
        </default:mi>
    </default:math>
</formula>

My goal is to get rid of all special entities like &Zopf; by replacing them by their numeric entity presentations.

I tried :

$test    = <content of the xml>;
$convmap = array(0x80, 0xffff, 0, 0xffff);
$test    = mb_encode_numericentity($test, $convmap, 'UTF-8');

But this will not replace the &Zopf; Any idea?

My goal is to get:

&#8484; 

as shown here: http://www.fileformat.info/info/unicode/char/2124/index.htm

Thank you.

like image 775
Milos Cuculovic Avatar asked Oct 21 '22 18:10

Milos Cuculovic


1 Answers

Your converter is converting your LaTeX into MathML, not HTML entities. You need something that converts directly into HTML character references, or a MathML to HTML character reference converter.

You should be able to use htmlentities:

htmlentities($symbolsToEncode, ENT_XML1, 'UTF-8');

http://pt1.php.net/htmlentities

You can change ENT_XML1 to ENT_SUBSTITUTE and it will return Unicode Replacement Characters or Hex character references.

As an alternative, you could use strtr to convert the characters to something you specify:

$chars = array(
    "\x8484" => "&#x8484;"
    ...
);

$convertedXML = strtr($xml, $chars);

http://php.net/strtr

Someone has done something similar on GitHub.

like image 171
Alex W Avatar answered Oct 27 '22 11:10

Alex W