Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using collections.Counter to count emojis with different colors

I would like to use the collections.Counter class to count emojis in a string. It generally works fine, however, when I introduce colored emojis the color component of the emoji is separated from the emoji like so:

>>> import collections
>>> emoji_string = "πŸ‘ŒπŸ»πŸ‘ŒπŸΌπŸ‘ŒπŸ½πŸ‘ŒπŸΎπŸ‘ŒπŸΏ"
>>> emoji_counter = collections.Counter(emoji_string)
>>> emoji_counter.most_common()
[('πŸ‘Œ', 5), ('🏻', 1), ('🏼', 1), ('🏽', 1), ('🏾', 1), ('🏿', 1)]

How can I make the most_common() function return something like this instead:

[('πŸ‘ŒπŸ»', 1), ('πŸ‘ŒπŸΌ', 1), ('πŸ‘ŒπŸ½', 1), ('πŸ‘ŒπŸΎ', 1), ('πŸ‘ŒπŸΏ', 1)]

I'm using Python 3.6

like image 407
Toni SučiΔ‡ Avatar asked May 08 '17 16:05

Toni SučiΔ‡


People also ask

How to get the Count of an element using counter in Python?

To get the count of an element using Counter you can do as follows: from collections import Counter counter1 = Counter ({'x': 5, 'y': 12, 'z': -2, 'x1':0}) print (counter1 ['y']) # this will give you the count of element 'y'

What is a counter object in Java?

that you can use to store information in memory. This article will be about the Counter object. A Counter is a container that tracks how many times equivalent values are added. use bag or multiset data structures.

How to count colored cells in Excel?

There is no built-in function to count colored cells in excel, but below mentioned are three different methods to do this task. For this example, look at the below data. As we can see, each city is marked with different colors. So we need to count the number of cities based on cell color. Follow the below steps to count cells by color.

Why to use counter in Python 3?

Here, are major reasons for using Python 3 Counter: The Counter holds the data in an unordered collection, just like hashtable objects. The elements here represent the keys and the count as values.


1 Answers

You'll have to split your string into separate clusters. Each of your emoji is really two codepoints; the emoji and a EMOJI MODIFIER FITZPATRICK TYPE X codepoint:

>>> print(emoji_string[0])
πŸ‘Œ
>>> print(emoji_string[1])
🏻
>>> print(emoji_string[:2])
πŸ‘ŒπŸ»
>>> print(ascii(emoji_string[:2]))
'\U0001f44c\U0001f3fb'
>>> import unicodedata
>>> unicodedata.name(emoji_string[1])
'EMOJI MODIFIER FITZPATRICK TYPE-1-2'

You could use a regular expression to keep those with the preceding emoji:

import re

char_with_modifier = re.compile(r'(.[\U0001f3fb-\U0001f3ff]?)')
split_emoji = char_with_modifier.findall(emoji_string)

and count the result.

Demo:

>>> import re
>>> from collections import Counter
>>> emoji_string = "πŸ‘ŒπŸ»πŸ‘ŒπŸΌπŸ‘ŒπŸ½πŸ‘ŒπŸΎπŸ‘ŒπŸΏ"
>>> char_with_modifier = re.compile(r'(.[\U0001f3fb-\U0001f3ff]?)')
>>> Counter(char_with_modifier.findall(emoji_string))
Counter({'πŸ‘ŒπŸ»': 1, 'πŸ‘ŒπŸΌ': 1, 'πŸ‘ŒπŸ½': 1, 'πŸ‘ŒπŸΎ': 1, 'πŸ‘ŒπŸΏ': 1})
like image 53
Martijn Pieters Avatar answered Sep 25 '22 10:09

Martijn Pieters