Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to convert Emojis to their respective HTML code entities in PHP 5.3?

I need to convert the Emojis (e.g. 😀) in strings to their respective HTML code entities (e.g. 😀) on a PHP 5.3 site.

I need to do this so that user input gets properly stored in a legacy script MySQL Database to later display properly when shown back to the user. When attempting to save Emojis directly from user input, they are incorrectly saved as ? in its Database. This legacy script does not support utf8mb4 in MySQL (this solution failed) and all attempts at converting its Database, Tables, and Columns to utf8mb4 have not solved this problem, so the only solution I have left which I already confirmed works is converting user-inputted Emojis in strings to their respective HTML code entities to correctly store those entities as-is in the Database so that they display correctly as Emojis when retrieved since modern browsers automatically convert those Emoji entities to Emoji characters.

I have also tried this solution, but it does not work in PHP 5.3, only in 5.4 and above. (I cannot upgrade to 5.4 on this particular site because the legacy script it depends on only works in 5.3 and cannot be changed or upgraded under any circumstances.)

I have also tried this solution, which works in PHP 5.3, but you can't feed it a string, only the specific Emoji, so it does not solve my problem despite working in PHP 5.3.

I only need the Emojis in a string converted, nothing else. (However, if that is not possible, then I suppose I can live with other HTML entities being converted with it, like & to &, but I prefer that not be the case.)

So how can I convert Emojis in strings to their respective HTML code entities in PHP 5.3 such that a string like this & that 😎 gets converted to this & that 😎?

like image 622
ProgrammerGirl Avatar asked Oct 31 '17 15:10

ProgrammerGirl


2 Answers

The code to detect the emoji bypasses stackoverflow's character limit, so here's a gist instead:

https://gist.github.com/BarryMode/432a7a1f9621e824c8a3a23084a50f60#file-htmlemoji-php

The entire function is essentially just

preg_replace_callback(pattern, callback, string);

The string is the input where you have emoji that you want to change into html entities. The pattern uses regex to find the emoji in the string and then each one is fed into the callback, which is where the conversion happens from emoji to html entity.

In creating this function, htmlemoji(), I combined a few different pieces of code that others had worked on. Here's some credits:

The callback uses this stackoverflow answer to build each entity.

The pattern was directly ripped from this source on GitHub.

like image 121
barry Avatar answered Sep 17 '22 10:09

barry


I have created a trait for this Which is a mix of the two ideas bellow, it covers missing ones like. 🤩

How to convert Emojis to their respective HTML code entities in PHP 5.3

Idea taken from https://gist.github.com/BarryMode/432a7a1f9621e824c8a3a23084a50f60#file-htmlemoji-php and https://github.com/chefkoch-dev/morphoji

A mix of the 2 ideas above.

trait ConvertEmojis {

/** @var string */
protected static $emojiPattern;

public function convert($str) {

    return preg_replace_callback($this->getEmojiPattern(), array(&$this, 'entity'), $str);
}

protected function entity($matches) {
    return '&#'.hexdec(bin2hex(mb_convert_encoding("$matches[0]", 'UTF-32', 'UTF-8'))).';';
}

/**
 * Returns a regular expression pattern to detect emoji characters.
 *
 * @return string
 */
protected function getEmojiPattern()
{
    if (null === self::$emojiPattern) {
        $codeString = '';

        foreach ($this->getEmojiCodeList() as $code) {
            if (is_array($code)) {

                $first = dechex(array_shift($code));
                $last  = dechex(array_pop($code));
                $codeString .= '\x{' . $first . '}-\x{' . $last . '}';

            } else {
                $codeString .= '\x{' . dechex($code) . '}';
            }
        }

        self::$emojiPattern = "/[$codeString]/u";
    }

    return self::$emojiPattern;
}

/**
 * Returns an array with all unicode values for emoji characters.
 *
 * This is a function so the array can be defined with a mix of hex values
 * and range() calls to conveniently maintain the array with information
 * from the official Unicode tables (where values are given in hex as well).
 *
 * With PHP > 5.6 this could be done in class variable, maybe even a
 * constant.
 *
 * @return array
 */
protected function getEmojiCodeList()
{
    return [
        // Various 'older' charactes, dingbats etc. which over time have
        // received an additional emoji representation.
        0x203c,
        0x2049,
        0x2122,
        0x2139,
        range(0x2194, 0x2199),
        range(0x21a9, 0x21aa),
        range(0x231a, 0x231b),
        0x2328,
        range(0x23ce, 0x23cf),
        range(0x23e9, 0x23f3),
        range(0x23f8, 0x23fa),
        0x24c2,
        range(0x25aa, 0x25ab),
        0x25b6,
        0x25c0,
        range(0x25fb, 0x25fe),
        range(0x2600, 0x2604),
        0x260e,
        0x2611,
        range(0x2614, 0x2615),
        0x2618,
        0x261d,
        0x2620,
        range(0x2622, 0x2623),
        0x2626,
        0x262a,
        range(0x262e, 0x262f),
        range(0x2638, 0x263a),
        0x2640,
        0x2642,
        range(0x2648, 0x2653),
        0x2660,
        0x2663,
        range(0x2665, 0x2666),
        0x2668,
        0x267b,
        0x267f,
        range(0x2692, 0x2697),
        0x2699,
        range(0x269b, 0x269c),
        range(0x26a0, 0x26a1),
        range(0x26aa, 0x26ab),
        range(0x26b0, 0x26b1),
        range(0x26bd, 0x26be),
        range(0x26c4, 0x26c5),
        0x26c8,
        range(0x26ce, 0x26cf),
        0x26d1,
        range(0x26d3, 0x26d4),
        range(0x26e9, 0x26ea),
        range(0x26f0, 0x26f5),
        range(0x26f7, 0x26fa),
        0x26fd,
        0x2702,
        0x2705,
        range(0x2708, 0x270d),
        0x270f,
        0x2712,
        0x2714,
        0x2716,
        0x271d,
        0x2721,
        0x2728,
        range(0x2733, 0x2734),
        0x2744,
        0x2747,
        0x274c,
        0x274e,
        range(0x2753, 0x2755),
        0x2757,
        range(0x2763, 0x2764),
        range(0x2795, 0x2797),
        0x27a1,
        0x27b0,
        0x27bf,
        range(0x2934, 0x2935),
        range(0x2b05, 0x2b07),
        range(0x2b1b, 0x2b1c),
        0x2b50,
        0x2b55,
        0x3030,
        0x303d,
        0x3297,
        0x3299,

        // Modifier for emoji sequences.
        0x200d,
        0x20e3,
        0xfe0f,

        // 'Regular' emoji unicode space, containing the bulk of them.
        range(0x1f000, 0x1f9cf)
    ];
}    

}

like image 39
Andyc Avatar answered Sep 18 '22 10:09

Andyc