I want to determine which elements of my vector contain emoji: <pre class="prettyprint"><code>x = c('😂', 'no', '🍹', '😀', 'no', '😛', '䨺', '감사') x # [1] "\U0001f602" "no" "\U0001f379" "\U0001f600" "no" "\U0001f61b" "䨺" "감사" </code></pre> Related posts only cover other languages, and because mostly they refer to specialized libraries, I couldn't figure out a way to translate to R: <ul> <li>What is the regex to extract all the emojis from a string?</li> <li>How do I remove emoji from string</li> <li>replace emoji unicode symbol using regexp in javascript</li> <li>Regular expression matching emoji in Mac OS X / iOS</li> <li>remove unicode emoji using re in python</li> </ul> The second looked very promising, but alas (not fixed by supplying <code>perl = TRUE</code>): <pre class="prettyprint"><code>x[grepl('[\u{1F600}-\u{1F6FF}]', x)] </code></pre> <blockquote> Error: invalid \u{xxxx} sequence (line 1) </blockquote> Similar issues come about from other questions. How can we match emoji in R?

I am converting the encoding to UTF-8 to compare the UTF-8 value of emoji's value with all the emoji's value in <code>remoji</code> library which is in UTF-8. I am using the <code>stringr</code> library to find the position of emoji's in the vector. One is free to use grep or any other function. 1st Method: <pre class="prettyprint"><code>library(stringr) xvect = c('😂', 'no', '🍹', '😀', 'no', '😛') Encoding(xvect) <- "UTF-8" which(str_detect(xvect,"[^[:ascii:]]")==T) # [1] 1 3 4 6 </code></pre> Here 1,3,4 and 6 are emoji's character in this case. Edited : 2nd Method: Install a package called <code>remoji</code> using devtools using below command, Since we have already converted the emoji items into UTF-8. we can now compare the UTF-8 values of all the emoji's present in the emoji library. Use <code>trimws</code> to remove the whitespaces <pre class="prettyprint"><code>install.packages("devtools") devtools::install_github("richfitz/remoji") library(remoji) emj <- emoji(list_emoji(), TRUE) xvect %in% trimws(emj) </code></pre> Output: <pre class="prettyprint"><code>which(xvect %in% trimws(emo)) # [1] 1 3 4 6 </code></pre> Both of the above methods are not full proof and first method assumes that there are no any ascii characters other than emojis in the vector and second method relies on the library information of <code>remoji</code>. In case where the a certain emoji information is not present in the library, the last command may yield a FALSE instead of TRUE. Final Edit: As per the discussion amongst OP(@MichaelChirico) and @SymbolixAU. Thanks to both of them it seems the problem with small typo of capital U. The new regex is <code>xvect[grepl('[\U{1F300}-\U{1F6FF}]', xvect)]</code> . The range in the character class is taken from F300 to F6FF. One can off course change this range to a new range in cases where an emoji lies outside this range. This may not be the complete list and over the period of time these ranges may keep increasing/changing.

How can I match emoji with an R regex?

Tags:

regex

r

utf-16

emoji

I want to determine which elements of my vector contain emoji:

x = c('😂', 'no', '🍹', '😀', 'no', '😛', '䨺', '감사')
x
# [1] "\U0001f602" "no"         "\U0001f379" "\U0001f600" "no"         "\U0001f61b" "䨺"         "감사"

Related posts only cover other languages, and because mostly they refer to specialized libraries, I couldn't figure out a way to translate to R:

What is the regex to extract all the emojis from a string?
How do I remove emoji from string
replace emoji unicode symbol using regexp in javascript
Regular expression matching emoji in Mac OS X / iOS
remove unicode emoji using re in python

The second looked very promising, but alas (not fixed by supplying perl = TRUE):

x[grepl('[\u{1F600}-\u{1F6FF}]', x)]

Error: invalid \u{xxxx} sequence (line 1)

Similar issues come about from other questions. How can we match emoji in R?

846

asked Apr 12 '17 02:04

MichaelChirico

1 Answers

I am converting the encoding to UTF-8 to compare the UTF-8 value of emoji's value with all the emoji's value in remoji library which is in UTF-8. I am using the stringr library to find the position of emoji's in the vector. One is free to use grep or any other function.

1st Method:

library(stringr)
xvect = c('😂', 'no', '🍹', '😀', 'no', '😛')

Encoding(xvect) <- "UTF-8"

which(str_detect(xvect,"[^[:ascii:]]")==T)
# [1] 1 3 4 6

Here 1,3,4 and 6 are emoji's character in this case.

Edited :

2nd Method: Install a package called remoji using devtools using below command, Since we have already converted the emoji items into UTF-8. we can now compare the UTF-8 values of all the emoji's present in the emoji library. Use trimws to remove the whitespaces

install.packages("devtools")

devtools::install_github("richfitz/remoji")
library(remoji)
emj <- emoji(list_emoji(), TRUE)
xvect %in% trimws(emj)

Output:

which(xvect %in% trimws(emo))
# [1] 1 3 4 6

Both of the above methods are not full proof and first method assumes that there are no any ascii characters other than emojis in the vector and second method relies on the library information of remoji. In case where the a certain emoji information is not present in the library, the last command may yield a FALSE instead of TRUE.

Final Edit:

As per the discussion amongst OP(@MichaelChirico) and @SymbolixAU. Thanks to both of them it seems the problem with small typo of capital U. The new regex is xvect[grepl('[\U{1F300}-\U{1F6FF}]', xvect)] . The range in the character class is taken from F300 to F6FF. One can off course change this range to a new range in cases where an emoji lies outside this range. This may not be the complete list and over the period of time these ranges may keep increasing/changing.

103

answered Sep 22 '22 19:09

PKumar

Related questions
                            
                                Specify Font type on R Markdown
                            
                                How to deal with ggplot2 and overlapping labels on a discrete axis
                            
                                How to create a large data frame in R with or without creating a matrix first and then converting it to a data.frame?
                            
                                fitting a linear mixed model to a very large data set
                            
                                Sublime 3 not interfacing with R (tried R-box and REPL)
                            
                                Idiom for conditionally selecting columns from a data.table
                            
                                Error in grid.Call(L_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : polygon edge not found (new)
                            
                                How to access the script/source history in RStudio?
                            
                                How to write Unicode string to text file in R Windows?
                            
                                bookdown with specific latex template
                            
                                Class() returns multiple multiple class names in R
                            
                                Which models in caret can use a sparse matrix for X?
                            
                                how to properly concatenate bidi strings in r?
                            
                                Shiny - observe() triggered by dynamicaly generated inputs
                            
                                How to avoid calling scale_color_manual all the time
                            
                                analysing shiny server log to create statistics on usage
                            
                                Shiny DataTable: Save full data.frame with buttons extension
                            
                                How to properly number headings in Word from a RMarkdown document
                            
                                Loading mysql table into python takes a very long time compared to R
                            
                                Insert markdown table in roxygen2 R package documentation

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With