I am hoping to identify which emojis are used most in a text conversation using SQL Lite. I am using DB Browser and the emojis show up like they do in iMessage (see below picture), but I am stumped on how to count them.
I was thinking if there was a way to check and see if a character is not a letter/number/punctuation, then I could count the frequency of all characters that don't fit the prerequisite list. That said, I am unfamiliar with SQLite commands and how I can accomplish that.
Is there a better way to go about this? Let me know if you need more context to answer this question.

The only way I can see to do this with SQLite directly would be to compile SQLite from the source code so you could add support for regex_replace.
However, you only plan to do it once, and recompiling SQLite might be a bit overkill.
Instead, you could copy your text column into a plain text file, and run the following command:
sed 's/\(.\)/\1\n/g' temp.txt | sed 's/[[:alnum:].-]//g' | sort -r | uniq -c
This would turn the following:
Hello! Are you stuck? š¤
I saw š»š»š» in the park!!!!!
šššššš - all lies.
Easy as 123! ššššššššššš
into:
1 š¤
11 š
3 š»
6 š
1 ?
7 !
17
50
Which would hopefully be close enough to get you to your goal. The last two entries are for tabs and spaces.
sed is a linux command, so if you are running windows you may want to get a windows version here: https://github.com/mbuilov/sed-windows
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With