replace emoji unicode symbol using regexp in javascript

Question

As you all know emoji symbols are coded up to 3 or 4 bytes, so it may occupy 2 symbols in my string. For example '😁wew😁'.length = 7 I want to find those symbols in my text and replace them to the value that is dependent from its code. Reading SO, I came up to XRegExp library with unicode plugin, but have not found the way how to make it work.

var str = '😁wew😁';// \u1F601 symbol
var reg = XRegExp('[\u1F601-\u1F64F]', 'g'); //  /[ὠ1-ὤF]/g -doesn't make a lot of sense  
//var reg = XRegExp('[\uD83D\uDE01-\uD83D\uDE4F]', 'g'); //Range out of order in character class
//var reg = XRegExp('\p{L}', 'g'); //doesn't match my symbols
console.log(XRegExp.replace(str, reg, function(match){
   return encodeURIComponent(match);// here I want to have smth like that %F0%9F%98%84 to be able to map anything I want to this value and replace to it
}));

jsfiddle

I really don't want to bruteforce the string looking for the sequence of characters from my range. Could someone help me to find the way to do that with regexp's.

EDITED Just came up with an idea of enumerating all the emoji symbols. Better than brutforce but still looking for the better idea

var reg = XRegExp('\uD83D\uDE01|\uD83D\uDE4F|...','g');

Adrien Parrochia · Accepted Answer

To remove all possible emojis:

new RegExp('[\u1000-\uFFFF]+', 'g');

shuizhongyuemin · Answer

maybe you can take a look of this article: http://crocodillon.com/blog/parsing-emoji-unicode-in-javascript

the emoji unicode from \u1F601 to \u1F64F

translate to javascript's utf-16 is \ud83d\ude00 to \ud83d\ude4f

the first char is always \ud83d.

so the reg is out:

/\ud83d[\ude00-\ude4f]/g

hope this can make some help

Andreas Zwettler · Answer

This is somewhat old, but I was looking into this problem ~~and it seems Bradley Momberger has posted a nice solution to it here: http://airhadoken.github.io/2015/04/22/javascript-string-handling-emoji.html~~

The regex he proposes is:

/[\uD800-\uDFFF]./ // This matches emoji

This regex matches the head surrogate, which is used by emojis, and the charracter following the head surrogate (which is assumed to be the tail surrogate). Thus, all emojis should be matched correctly and with

.replace(/[\uD800-\uDFFF]./g,'')

you should be able to remove all emojis.

Edit: Better regex found. The above regex misses some emojis.

But there is a reddit post with a version, for which i cannot find an emoji, that is excepted from the rule. The reddit is here: https://www.reddit.com/r/tasker/comments/4vhf2f/how_to_regex_emojis_in_tasker_for_search_match_or/ And the regex is:

/[\uD83C-\uDBFF\uDC00-\uDFFF]+/

To match all occurences, use the g modifier:

/[\uD83C-\uDBFF\uDC00-\uDFFF]+/g

Second Edit: As CodeToad pointed out correctly, ✨ is not recognized by the above Regex, because it's in the dingbats block (thanks to air_hadoken).

The lodash library came up with an excellent Emoji Regex block:

(?:[\u2700-\u27bf]|(?:\ud83c[\udde6-\uddff]){2}|[\ud800-\udbff][\udc00-\udfff])[\ufe0e\ufe0f]?(?:[\u0300-\u036f\ufe20-\ufe23\u20d0-\u20f0]|\ud83c[\udffb-\udfff])?(?:\u200d(?:[^\ud800-\udfff]|(?:\ud83c[\udde6-\uddff]){2}|[\ud800-\udbff][\udc00-\udfff])[\ufe0e\ufe0f]?(?:[\u0300-\u036f\ufe20-\ufe23\u20d0-\u20f0]|\ud83c[\udffb-\udfff])?)*

Kevin Scott nicely put together, what this regex covers in his Blog Post. Spoiler: it includes dingbats 🎉

Jukka K. Korpela · Answer

The \u.... notation has four hex digits, no less, no more, so it can only represent code points up to U+FFFF. Unicode characters above that are represented as pairs of surrogate code points.

So some indirect approach is needed. Cf. to JavaScript strings outside of the BMP.

For example, you could look for code points in the range [\uD800-\uDBFF] (high surrogates), and when you find one, check that the next code point in the string is in the range [\uDC00-\uDFFF] (if not, there is a serious data error), interpret the two as a Unicode character, and replace them by whatever you wish to put there. This looks like a job for a simple loop through the string, rather than a regular expression.

replace emoji unicode symbol using regexp in javascript

Tags:

javascript

regex

unicode

emoji

Fedor Skrynnikov

4 Answers

Adrien Parrochia

shuizhongyuemin

Andreas Zwettler

Jukka K. Korpela

Recent Activity

Donate For Us

replace emoji unicode symbol using regexp in javascript

Tags:

javascript

regex

unicode

emoji

Fedor Skrynnikov

4 Answers

Adrien Parrochia

shuizhongyuemin

Andreas Zwettler

Jukka K. Korpela

Related questions

Recent Activity

Donate For Us