Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Many emoji characters are not read by python file read

I have a list of 1500 emoji character dictionary in a json file, and I wanted to import those to my python code, I did a file read and convert it to a python dictionary but now I have only 143 records. How can I import all the emoji to my code, this is my code.

import sys
import ast

file = open('emojidescription.json','r').read()
non_bmp_map = dict.fromkeys(range(0x10000, sys.maxunicode + 1), 0xfffd)
emoji_dictionary = ast.literal_eval(file.translate(non_bmp_map))

#word = word.replaceAll(",", " ");

keys = list(emoji_dictionary["emojis"][0].keys())
values = list(emoji_dictionary["emojis"][0].values())

file_write = open('output.txt','a')

print(len(keys))
for i in range(len(keys)):
    try:
        content = 'word = word.replace("{0}", "{1}")'.format(keys[i],values[i][0])
    except Exception as e:
        content = 'word = word.replace("{0}", "{1}")'.format(keys[i],'')
    #file.write()
    #print(keys[i],values[i])
    print(content)


file_write.close()

This is my input sample

{

    "emojis": [
        {

            "👨‍🎓": ["Graduate"],
            "©": ["Copy right"],
            "®": ["Registered"],
            "👨‍👩‍👧": ["family"],
            "👩‍❤️‍💋‍👩": ["love"],
            "™": ["trademark"],
            "👨‍❤‍👨": ["love"], 
            "⌚": ["time"],
            "⌛": ["wait"], 
            "⭐": ["star"],
            "🐘": ["Elephant"],
            "🐕": ["Cat"],
            "🐜": ["ant"],
            "🐔": ["cock"],
            "🐓": ["cock"],

This is my result, and the 143 denotes number of emoji.

143

word = word.replace("�‍�‍�‍�", "family")

word = word.replace("Ⓜ", "")

word = word.replace("♥", "")

word = word.replace("♠", "")

word = word.replace("⌛", "wait")

like image 977
CDR Avatar asked Jun 10 '17 08:06

CDR


People also ask

Does Python accept emoji?

Emojis can also be implemented by using the emoji module provided in Python. To install it run the following in the terminal. emojize() function requires the CLDR short name to be passed in it as the parameter. It then returns the corresponding emoji.

How do you encode emojis in Python?

Every emoji has a unique Unicode assigned to it. When using Unicode with Python, replace "+" with "000" from the Unicode. And then prefix the Unicode with "\". For example- U+1F605 will be used as \U0001F605.

How do I get rid of emoticons in Python?

To remove the emojis, we set the parameter no_emoji to True .

What encoding should I use for emojis?

Emojis look like images, or icons, but they are not. They are letters (characters) from the UTF-8 (Unicode) character set. UTF-8 covers almost all of the characters and symbols in the world.


1 Answers

I'm not sure why you're seeing only 143 records from an input of 1500 (your sample doesn't seem to display this behavior).

The setup doesn't seem to do anything useful, but what you're doing boils down to (simplified and skipping lots of details):

d = ..read json as python dict.
keys = d.keys()
values = d.values()
for i in range(len(keys)):
    key = keys[i]
    value = values[i]

and that should be completely correct. There are better ways to do this in Python, however, like using the zip function:

d = ..read json as python dict.
keys = d.keys()
values = d.values()
for key, value in zip(keys, values):  # zip picks pair-wise elements
    ...

or simply asking the dict for its items:

for key, value in d.items():
    ...

The json module makes reading and writing json much simpler (and safer), and using the idiom from above the problem reduces to this:

import json

emojis = json.load(open('emoji.json', 'rb'))

with open('output.py', 'wb') as fp:
    for k,v in emojis['emojis'][0].items():
        val = u'word = word.replace("{0}", "{1}")\n'.format(k, v[0] if v else "")
        fp.write(val.encode('u8'))
like image 157
thebjorn Avatar answered Nov 01 '22 21:11

thebjorn