Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

htmlentites not working for emoji

I am trying to show a characters html entity

echo htmlentities(htmlentities("&"));
//outputs &
echo htmlentities(htmlentities("<"));
//outputs &lt;

but it does not seem to work with emoji

echo htmlentities(htmlentities("😎"));
//outputs 😎

How can I get it to output &#128526;?


Edit:

I am trying to display a string input by the user with all of the html entities encoded.
echo htmlentities(htmlentities($input))

Example: "this & that 😎" -> "this &amp; that &#128526;"

like image 656
Tony Brix Avatar asked Jan 22 '16 21:01

Tony Brix


People also ask

What's the difference between Htmlentities () and htmlspecialchars ()?

Difference between htmlentities() and htmlspecialchars() function: The only difference between these function is that htmlspecialchars() function convert the special characters to HTML entities whereas htmlentities() function convert all applicable characters to HTML entities.

What is the purpose of Htmlentities () function?

The htmlentities() function converts characters to HTML entities.


3 Answers

This works for regular HTML entities, UTF-8 emoticons (and other utf stuff) as well as regular strings of course.

I was just having trouble with empty string value, so I had to put this condition into the function.

function entities( $string ) {
    $stringBuilder = "";
    $offset = 0;

    if ( empty( $string ) ) {
        return "";
    }

    while ( $offset >= 0 ) {
        $decValue = ordutf8( $string, $offset );
        $char = unichr($decValue);

        $htmlEntited = htmlentities( $char );
        if( $char != $htmlEntited ){
            $stringBuilder .= $htmlEntited;
        } elseif( $decValue >= 128 ){
            $stringBuilder .= "&#" . $decValue . ";";
        } else {
            $stringBuilder .= $char;
        }
    }

    return $stringBuilder;
}

// source - http://php.net/manual/en/function.ord.php#109812
function ordutf8($string, &$offset) {
    $code = ord(substr($string, $offset,1));
    if ($code >= 128) {        //otherwise 0xxxxxxx
        if ($code < 224) $bytesnumber = 2;                //110xxxxx
        else if ($code < 240) $bytesnumber = 3;        //1110xxxx
        else if ($code < 248) $bytesnumber = 4;    //11110xxx
        $codetemp = $code - 192 - ($bytesnumber > 2 ? 32 : 0) - ($bytesnumber > 3 ? 16 : 0);
        for ($i = 2; $i <= $bytesnumber; $i++) {
            $offset ++;
            $code2 = ord(substr($string, $offset, 1)) - 128;        //10xxxxxx
            $codetemp = $codetemp*64 + $code2;
        }
        $code = $codetemp;
    }
    $offset += 1;
    if ($offset >= strlen($string)) $offset = -1;
    return $code;
}

// source - http://php.net/manual/en/function.chr.php#88611
function unichr($u) {
    return mb_convert_encoding('&#' . intval($u) . ';', 'UTF-8', 'HTML-ENTITIES');
}

/* ---- */

var_dump( entities( "&" ) ) . "\n";
var_dump( entities( "<" ) ) . "\n";
var_dump( entities( "😎" ) ) . "\n";
var_dump( entities( "☚" ) ) . "\n";
var_dump( entities( "" ) ) . "\n";
var_dump( entities( "A" ) ) . "\n";
var_dump( entities( "Hello 😎 world" ) ) . "\n";
var_dump( entities( "this & that 😎" ) ) . "\n";
like image 109
Petr Hejda Avatar answered Oct 18 '22 23:10

Petr Hejda


$emoji = "\xF0\x9F\x98\x8E"; // its your emoji

I get this callback from convert unicode to html entities hex

$hex = preg_replace_callback('/[\x{80}-\x{10FFFF}]/u', function ($m) {
    $char = current($m);
    $utf = iconv('UTF-8', 'UCS-4', $char);
    return sprintf("&#x%s;", ltrim(strtoupper(bin2hex($utf)), "0"));
}, $emoji);

echo $hex;

echo json_encode(("\xF0\x9F\x98\x8E")); // its decoded. htmlentities doesn't work with it.

Is this OK ?

like image 25
FZE Avatar answered Oct 18 '22 23:10

FZE


htmlentities documentation states that

all characters which have HTML character entity equivalents are translated into these entities.

Your emoji does not have an equivalent like &lt; is for <, so it doesn't get converted. &#128526; is just an HTML code, not an HTML entity.

function htmlEntitiesOrCode($string) {
    //try htmlentities first
    $result = htmlentities($string, ENT_COMPAT, "UTF-8");

    //if the output is different from input, an entity was returned
    if ($result != $string) {
        return $result;
    }

    //get the html code
    $offset = 0;
    $code = ord(substr($string, $offset,1));
    if ($code >= 128) {
        if ($code < 224) {
            $bytesnumber = 2;
        } else if ($code < 240) {
            $bytesnumber = 3;
        } else if ($code < 248) {
            $bytesnumber = 4;
        }
        $codetemp = $code - 192 - ($bytesnumber > 2 ? 32 : 0) - ($bytesnumber > 3 ? 16 : 0);
        for ($i = 2; $i <= $bytesnumber; $i++) {
            $offset ++;
            $code2 = ord(substr($string, $offset, 1)) - 128;
            $codetemp = $codetemp*64 + $code2;
        }
        $code = $codetemp;
    }
    $offset += 1;
    if ($offset >= strlen($string)) {
        $offset = -1;
    }

    $result = "&#" . $code;
    return $result;
}

HTML code function taken from here: http://php.net/manual/en/function.ord.php#109812

like image 29
Mihai Răducanu Avatar answered Oct 18 '22 22:10

Mihai Răducanu