Is it possible to re-encode emoji 3 or 4 byte strings into emoji again?
I inherited a MySQL Innodb table with utf8_unicode_ci encoding. These emoji 4 byte strings are everywhere. Is it possible to translate them back into emoji?
First step was to modify the character set
to utf8mb4
. This changed all strings like �
to strings like this: 😊
.
But what I really want is to translate 😊
into something like . (I have no idea if 😊
is really a smiley)
The Unicode Standard has assigned numbers to represent emojis. Here's how it works. In the Unicode Standard, each emoji is represented as a "code point" (a hexadecimal number) that looks like U+1F063, for example.
The Difference Between Unicode and UTF-8Unicode is a character set. UTF-8 is encoding. Unicode is a list of characters with unique decimal numbers (code points).
One way to check this is to use the W3C Markup Validation Service. The validator usually detects the character encoding from the HTTP headers and information in the document. If the validator fails to detect the encoding, it can be selected on the validator result page via the 'Encoding' pulldown menu (example).
Inspired by Ignacio Vazquez-Abrams' comment. Next python code snippet shows origin procedure Emoji to Mojibake and vice versa (repair):
print ( "\nEmoji to mojibake (origin):")
for emojiChar in ['😊','😣','👽','😎']:
print ( emojiChar, emojiChar.encode('utf8').decode('cp1252'))
print ( "\nmojibake to Emoji (repair):")
for mojibakeString in ['😊','😣','👽','😎','🙇']:
print ( mojibakeString, mojibakeString.encode('cp1252').decode('utf8'))
I know that the question is tagged php rather than python; let me hope that analogous php solution could be very close…
Output:
==> chcp 65001
Active code page: 65001
==> D:\test\Python\20108312.py
Emoji to mojibake (origin):
😊 😊
😣 😣
👽 👽
😎 😎
mojibake to Emoji (repair):
😊 😊
😣 😣
👽 👽
😎 😎
🙇 🙇
==>
Python version:
Python 3.5.1 (v3.5.1:37a07cee5969, Dec 6 2015, 01:54:25) [MSC v.1900 64 bit (AMD64)] on win32
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With