Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to compare and convert emoji characters in C#

I am trying to figure out how to check if a string contains a specfic emoji. For example, look at the following two emoji:

Bicyclist: http://unicode.org/emoji/charts/full-emoji-list.html#1f6b4

US Flag: http://unicode.org/emoji/charts/full-emoji-list.html#1f1fa_1f1f8

Bicyclist is U+1F6B4, and the US flag is U+1F1FA U+1F1F8.

However, the emoji to check for are provided to me in an array like this, with just the numerical value in strings:

var checkFor = new string[] {"1F6B4","1F1FA-1F1F8"};

How can I convert those array values into actual unicode characters and check to see if a string contains them?

I can get something working for the Bicyclist, but for the US flag I'm stumped.

For the Bicyclist, I'm doing the following:

const string comparisonStr = "..."; //some string containing text and emoji

var hexVal = Convert.ToInt32(checkFor[0], 16);
var strVal = Char.ConvertFromUtf32(hexVal);

//now I can successfully do the following check

var exists = comparisonStr.Contains(strVal);

But this will not work with the US Flag because of the multiple code points.

like image 596
tbraun Avatar asked Oct 01 '15 19:10

tbraun


People also ask

How do I add emojis to a compiled C program?

The compiled program simply outputs that byte sequence, no different from the Hello World program. The terminal then takes car If you get your C code to print in the Unicode character set, there are emojis defined in it. That would probably be the simplest way to do so. Or, use ASCII emojis like we did “in ye olde days”. :-)

How many bytes is a Unicode character in C?

Case 3: Else if first byte value >= C0 (Hex) or 192 (Decimal) or 11000000 (Binary), the Unicode character is placed in 2 bytes. Case 4: Else, in rest of the cases, the Unicode character is placed in 1 byte.

What is an emojis?

Emoji is a small digital image or icon used to express an idea or emotion. These are small enough to insert into the text. In Japanese “e” means picture and “moji” means character.

How to handle emojis/emoticons in a text message?

Both emoji and emoticon convey emotional expression in a text message for text analysis we might need to handle it carefully. We can handle these in two ways- 1.By removing these from the texts. Removing the emojis/emoticons from the text for text analysis might not be a good decision.


1 Answers

You already got past the hard part. All you were missing is parsing the value in the array, and combining the 2 unicode characters before performing the check.

Here is a sample program that should work:

static void Main(string[] args)
{
    const string comparisonStr = "bicyclist: \U0001F6B4, and US flag: \U0001F1FA\U0001F1F8"; //some string containing text and emoji
    var checkFor = new string[] { "1F6B4", "1F1FA-1F1F8" };

    foreach (var searchStringInHex in checkFor)
    {
        string searchString = string.Join(string.Empty, searchStringInHex.Split('-')
                                                        .Select(hex => char.ConvertFromUtf32(Convert.ToInt32(hex, 16))));

        if (comparisonStr.Contains(searchString))
        {
            Console.WriteLine($"Found {searchStringInHex}!");
        }
    }
}
like image 120
sstan Avatar answered Sep 28 '22 00:09

sstan