Note: this question could look odd on systems not supporting the included emoji.
This is a follow-up question to How do I remove emoji from string.
I want to build a regular expression that matches all emoji that can be entered in Mac OS X / iOS.
The obvious Unicode blocks cover most, but not all of these emoji:
Wikipedia provides a compiled list of all the symbols available in Apple Color Emoji on OS X Mountain Lion and iOS 6, which looks like a good starting point: (slightly updated)
people = '๐๐๐๐โบ๏ธ๐๐๐๐๐๐๐๐๐๐ณ๐๐๐๐๐๐ฃ๐ข๐๐ญ๐ช๐ฅ๐ฐ๐
๐๐ฉ๐ซ๐จ๐ฑ๐ ๐ก๐ค๐๐๐๐ท๐๐ด๐ต๐ฒ๐๐ฆ๐ง๐๐ฟ๐ฎ๐ฌ๐๐๐ฏ๐ถ๐๐๐๐ฒ๐ณ๐ฎ๐ท๐๐ถ๐ฆ๐ง๐จ๐ฉ๐ด๐ต๐ฑ๐ผ๐ธ๐บ๐ธ๐ป๐ฝ๐ผ๐๐ฟ๐น๐พ๐น๐บ๐๐๐๐๐ฝ๐ฉ๐ฅโจ๐๐ซ๐ฅ๐ข๐ฆ๐ง๐ค๐จ๐๐๐๐
๐๐๐๐๐โโ๐โ๐๐๐๐๐๐๐โ๐๐ช๐ถ๐๐๐ซ๐ช๐ฌ๐ญ๐๐๐ฏ๐๐
๐๐๐๐๐
๐ฐ๐๐๐๐ฉ๐๐๐๐๐ก๐ ๐ข๐๐๐๐๐ฝ๐๐๐๐ผ๐๐๐๐๐๐๐๐๐๐๐โค๐๐๐๐๐๐๐๐๐๐๐๐ค๐ฅ๐ฌ๐ฃ๐ญ'
nature = '๐ถ๐บ๐ฑ๐ญ๐น๐ฐ๐ธ๐ฏ๐จ๐ป๐ท๐ฝ๐ฎ๐๐ต๐๐ด๐๐๐ผ๐ง๐ฆ๐ค๐ฅ๐ฃ๐๐๐ข๐๐๐๐๐๐๐๐ ๐๐ฌ๐ณ๐๐๐๐๐๐
๐๐๐๐๐๐๐๐๐๐ฒ๐ก๐๐ซ๐ช๐๐๐ฉ๐พ๐๐ธ๐ท๐๐น๐ป๐บ๐๐๐๐ฟ๐พ๐๐ต๐ด๐ฒ๐ณ๐ฐ๐ฑ๐ผ๐๐๐๐๐๐๐๐๐๐๐๐๐๐๐๐๐๐๐๐๐ โญโโ
โโกโโโ๐๐๐๐'
objects = '๐๐๐๐๐๐๐๐๐๐๐๐ป๐
๐๐๐๐๐๐๐๐ฎ๐ฅ๐ท๐น๐ผ๐ฟ๐๐ฝ๐พ๐ป๐ฑโ๐๐๐ ๐ก๐บ๐ป๐๐๐๐๐๐๐ข๐ฃโณโโฐโ๐๐๐๐๐๐๐ก๐ฆ๐๐
๐๐๐๐๐๐ฟ๐ฝ๐ง๐ฉ๐จ๐ช๐ฌ๐ฃ๐ซ๐ช๐๐๐ฐ๐ด๐ต๐ท๐ถ๐ณ๐ธ๐ฒ๐ง๐ฅ๐คโ๐ฉ๐จ๐ฏ๐ซ๐ช๐ฌ๐ญ๐ฎ๐ฆ๐๐๐๐๐๐๐๐๐๐
๐๐๐๐โ๐๐โโ๐๐๐๐๐๐๐๐๐๐๐๐๐๐ฌ๐ญ๐ฐ๐จ๐ฌ๐ค๐ง๐ผ๐ต๐ถ๐น๐ป๐บ๐ท๐ธ๐พ๐ฎ๐๐ด๐๐ฒ๐ฏ๐๐โฝโพ๐พ๐ฑ๐๐ณโณ๐ต๐ด๐๐๐๐ฟ๐๐๐๐ฃโ๐ต๐ถ๐ผ๐บ๐ป๐ธ๐น๐ท๐ด๐๐๐๐๐๐๐๐ค๐ฑ๐ฃ๐ฅ๐๐๐๐๐ฒ๐ข๐ก๐ณ๐๐ฉ๐ฎ๐ฆ๐จ๐ง๐๐ฐ๐ช๐ซ๐ฌ๐ญ๐ฏ๐๐๐๐๐๐๐๐๐๐๐๐๐๐ ๐๐
๐ฝ'
places = '๐ ๐ก๐ซ๐ข๐ฃ๐ฅ๐ฆ๐ช๐ฉ๐จ๐โช๐ฌ๐ค๐๐๐ฏ๐ฐโบ๐ญ๐ผ๐พ๐ป๐๐
๐๐ฝ๐๐ ๐กโฒ๐ข๐ขโต๐ค๐ฃโ๐โ๐บ๐๐๐๐๐๐๐๐
๐๐๐๐๐๐๐๐๐๐๐๐๐๐๐๐จ๐๐๐๐๐๐ฒ๐ก๐๐ ๐๐๐๐ซ๐ฆ๐ฅโ ๐ง๐ฐโฝ๐ฎ๐ฐโจ๐ฟ๐ช๐ญ๐๐ฉ๐ฏ๐ต๐ฐ๐ท๐ฉ๐ช๐จ๐ณ๐บ๐ธ๐ซ๐ท๐ช๐ธ๐ฎ๐น๐ท๐บ๐ฌ๐ง'
symbols = '1๏ธโฃ2๏ธโฃ3๏ธโฃ4๏ธโฃ5๏ธโฃ6๏ธโฃ7๏ธโฃ8๏ธโฃ9๏ธโฃ0๏ธโฃ๐๐ข#๏ธโฃ๐ฃโฌ๏ธโฌ๏ธโฌ
๏ธโก๏ธ๐ ๐ก๐คโ๏ธโ๏ธโ๏ธโ๏ธโ๏ธโ๏ธ๐โ๏ธโถ๏ธ๐ผ๐ฝโฉ๏ธโช๏ธโน๏ธโชโฉโซโฌโคต๏ธโคด๏ธ๐๐๐๐๐๐๐๐๐๐ถ๐ฆ๐๐ฏ๐ณ๐ต๐ด๐ฒ๐๐น๐บ๐ถ๐๐ป๐น๐บ๐ผ๐พ๐ฐ๐ฎ๐
ฟ๏ธโฟ๏ธ๐ญ๐ท๐ธ๐โ๏ธ๐๐๐
๐๐ใ๏ธใ๏ธ๐๐๐๐ซ๐๐ต๐ฏ๐ฑ๐ณ๐ท๐ธโโณ๏ธโ๏ธโโ
โด๏ธ๐๐๐ณ๐ด๐
ฐ๐
ฑ๐๐
พ๐ โฟโป๏ธโ๏ธโ๏ธโ๏ธโ๏ธโ๏ธโ๏ธโ๏ธโ๏ธโ๏ธโ๏ธโ๏ธโ๏ธโ๐ฏ๐ง๐น๐ฒ๐ฑยฉ๏ธยฎ๏ธโข๏ธโโผ๏ธโ๏ธโโโโโญ๐๐๐๐๐๐๐๐ง๐๐๐๐๐๐๐๐๐๐ ๐๐๐๐๐๐๐ก๐ข๐ฃ๐ค๐ฅ๐ฆโ๏ธโโโโ โฅโฃโฆ๐ฎ๐ฏโโ๐๐โฐใฐใฝ๏ธ๐ฑโผ๏ธโป๏ธโพ๏ธโฝ๏ธโช๏ธโซ๏ธ๐บ๐ฒ๐ณโซ๏ธโช๏ธ๐ด๐ต๐ปโฌ๏ธโฌ๏ธ๐ถ๐ท๐ธ๐น'
emoji = people + nature + objects + places + symbols # all emoji combined
Most characters have a single code point and converting these would be easy:
But some characters are "encoded using two Unicode values":
And some even have 3 codepoints:
(Variation Selector 16 means "emoji style")
How can I split this list into characters (without splitting combined characters), find their code point(s) and finally build a regular expression matching them?
The regex doesn't have to respect "missing" characters within larger blocks, i.e. it's okay if the 4 Unicode blocks mentioned above are entirely covered.
(I'm going to answer this myself if I don't get any answers, but maybe there's an easy solution)
Much the way you can match accented characters, you can use unicode property escapes to match emojis: I've previously seen massive arrays of every emoji ever created, and it may be possible that {Emoji_Presentation} doesn't contain all emojis across all devices, but this regex has matched every case I've come across. Happy emoji....ing!
The new regex is xvect [grepl (' [\U {1F300}-\U {1F6FF}]', xvect)] . The range in the character class is taken from F300 to F6FF. One can off course change this range to a new range in cases where an emoji lies outside this range.
Emojis displayed on iPhone, iPad, Mac, Apple Watch and Apple TV use the Apple Color Emoji font installed on iOS, macOS, watchOS and tvOS. Some Apple devices support Animoji and Memoji . Two Private Use Area characters are not cross-platform compatible but do work on Apple devices: 117 new emojis are now available in iOS 14.2 and macOS 11 Big Sur.
iOS 15.0 will not include any new emojis from Emoji 14.0, the latest set of emoji recommendations made in September 2021. Support for Emoji 14.0 on Apple platforms is expected in the first half of 2022.
The upcoming Unicode Emoji data files would help with this. At the moment these are still drafts, but they might still help you out.
By parsing http://www.unicode.org/Public/emoji/1.0/emoji-data.txt you could get quite easily get a list of all emoji in the Unicode standard. (Note that some of these emoji consist of multiple code points.) Once you have such a list, itโs trivial to turn it into a regular expression.
Hereโs a JavaScript version: https://github.com/mathiasbynens/emoji-regex/blob/master/index.js And hereโs the script that generates it based on the data from emoji-data.txt
: https://github.com/mathiasbynens/emoji-regex/blob/master/scripts/generate-regex.js
This regex matches all 845 emoji, taken from Emoji unicode characters for use on the web:
[\u{203C}\u{2049}\u{20E3}\u{2122}\u{2139}\u{2194}-\u{2199}\u{21A9}-\u{21AA}\u{231A}-\u{231B}\u{23E9}-\u{23EC}\u{23F0}\u{23F3}\u{24C2}\u{25AA}-\u{25AB}\u{25B6}\u{25C0}\u{25FB}-\u{25FE}\u{2600}-\u{2601}\u{260E}\u{2611}\u{2614}-\u{2615}\u{261D}\u{263A}\u{2648}-\u{2653}\u{2660}\u{2663}\u{2665}-\u{2666}\u{2668}\u{267B}\u{267F}\u{2693}\u{26A0}-\u{26A1}\u{26AA}-\u{26AB}\u{26BD}-\u{26BE}\u{26C4}-\u{26C5}\u{26CE}\u{26D4}\u{26EA}\u{26F2}-\u{26F3}\u{26F5}\u{26FA}\u{26FD}\u{2702}\u{2705}\u{2708}-\u{270C}\u{270F}\u{2712}\u{2714}\u{2716}\u{2728}\u{2733}-\u{2734}\u{2744}\u{2747}\u{274C}\u{274E}\u{2753}-\u{2755}\u{2757}\u{2764}\u{2795}-\u{2797}\u{27A1}\u{27B0}\u{2934}-\u{2935}\u{2B05}-\u{2B07}\u{2B1B}-\u{2B1C}\u{2B50}\u{2B55}\u{3030}\u{303D}\u{3297}\u{3299}\u{1F004}\u{1F0CF}\u{1F170}-\u{1F171}\u{1F17E}-\u{1F17F}\u{1F18E}\u{1F191}-\u{1F19A}\u{1F1E7}-\u{1F1EC}\u{1F1EE}-\u{1F1F0}\u{1F1F3}\u{1F1F5}\u{1F1F7}-\u{1F1FA}\u{1F201}-\u{1F202}\u{1F21A}\u{1F22F}\u{1F232}-\u{1F23A}\u{1F250}-\u{1F251}\u{1F300}-\u{1F320}\u{1F330}-\u{1F335}\u{1F337}-\u{1F37C}\u{1F380}-\u{1F393}\u{1F3A0}-\u{1F3C4}\u{1F3C6}-\u{1F3CA}\u{1F3E0}-\u{1F3F0}\u{1F400}-\u{1F43E}\u{1F440}\u{1F442}-\u{1F4F7}\u{1F4F9}-\u{1F4FC}\u{1F500}-\u{1F507}\u{1F509}-\u{1F53D}\u{1F550}-\u{1F567}\u{1F5FB}-\u{1F640}\u{1F645}-\u{1F64F}\u{1F680}-\u{1F68A}]
Examples can be found here: https://stackoverflow.com/a/29115920/1911674
EDIT: I udpated the regex to exclude ASCII numbers and symbols. See comments from How do I remove emoji from string for details.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With