I have a csv file that contains a lot of emoji:
Person, Message,
A, 😉,
A, How are you?,
B, 🙍 Alright!,
A, 💃💃
How can I read.csv()
into R so that the emoji don't become black ?s
(I want to track emoji usage over time 👽)
My console has a font that accepts those "characters":
txt <- "Person, Message,
A, 😉,
A, How are you?,
B, 🙍 Alright!,
A, 💃💃"
Encoding(txt)
#[1] "UTF-8"
dput(txt)
#"Person, Message,\nA, \U0001f609,\nA, How are you?,\nB, \U0001f64d Alright!,\nA, \U0001f483\U0001f483"
> tvec <- scan(text=txt, what="")
Read 13 items
> dput(tvec)
c("Person,", "Message,", "A,", "\U0001f609,", "A,", "How", "are",
"you?,", "B,", "\U0001f64d", "Alright!,", "A,", "\U0001f483\U0001f483"
)
> which(tvec == '\U0001f609,')
[1] 4
When I used scan to read that text using a comma sep, then the leading space prevented the equality test from succeeding, but it succeeded if I used the two character version:
> which(tvec == '\U0001f609')
integer(0)
> dput(tvec)
c("Person", " Message", "", "A", " \U0001f609", "", "A", " How are you?",
"", "B", " \U0001f64d Alright!", "", "A", " \U0001f483\U0001f483"
)
> which(tvec == " 😉")
[1] 5
This is with Courier New as the console/editor font on a Mac. To see the explanation for Unicode representations look at ?Quotes
{base}.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With