Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Working with emoji in R

Tags:

r

emoji

I have a csv file that contains a lot of emoji:

Person, Message,
A, 😉,
A, How are you?,
B, 🙍 Alright!,
A, 💃💃

How can I read.csv() into R so that the emoji don't become black ?s

(I want to track emoji usage over time 👽)

like image 522
emehex Avatar asked Sep 25 '22 10:09

emehex


1 Answers

My console has a font that accepts those "characters":

  txt <- "Person, Message,
 A, 😉,
 A, How are you?,
 B, 🙍 Alright!,
 A, 💃💃"

 Encoding(txt)
#[1] "UTF-8"
 dput(txt)
#"Person, Message,\nA, \U0001f609,\nA, How are you?,\nB, \U0001f64d Alright!,\nA, \U0001f483\U0001f483"

> tvec <- scan(text=txt, what="")
Read 13 items
> dput(tvec)
c("Person,", "Message,", "A,", "\U0001f609,", "A,", "How", "are", 
"you?,", "B,", "\U0001f64d", "Alright!,", "A,", "\U0001f483\U0001f483"
)

> which(tvec == '\U0001f609,')
[1] 4

When I used scan to read that text using a comma sep, then the leading space prevented the equality test from succeeding, but it succeeded if I used the two character version:

> which(tvec == '\U0001f609')
integer(0)
> dput(tvec)
c("Person", " Message", "", "A", " \U0001f609", "", "A", " How are you?", 
"", "B", " \U0001f64d Alright!", "", "A", " \U0001f483\U0001f483"
)
> which(tvec == " 😉")
[1] 5

This is with Courier New as the console/editor font on a Mac. To see the explanation for Unicode representations look at ?Quotes {base}.

like image 200
IRTFM Avatar answered Sep 28 '22 07:09

IRTFM