Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Emoticons in Twitter Sentiment Analysis in r

How do I handle/get rid of emoticons so that I can sort tweets for sentiment analysis?

Getting: Error in sort.list(y) : invalid input

Thanks

and this is how the emoticons come out looking from twitter and into r:

\xed��\xed�\u0083\xed��\xed��
\xed��\xed�\u008d\xed��\xed�\u0089 
like image 629
Rhodo Avatar asked Apr 01 '13 17:04

Rhodo


People also ask

Should we remove emoticons for sentiment analysis?

Remove mentions as they also do not weigh in sentiment analyzing. Replace any emojis with the text they represent as emojis or emoticons plays an important role in representing a sentiment. Replace contractions with their full forms. Remove any URLs present in tweets as they are not significant in sentiment analysis.

How do I get Twitter emoticons?

Tap your profile photo at the top-left corner of the screen and select Profile. Tap Edit profile at the top-right corner of the screen. To insert an emoji into the name field, tap that field, tap the emoji key on the keyboard, then insert your desired emoji.

Can you use emoticons on Twitter?

On Twitter, hit command-control-space to bring up an emoji keyboard. It will look, conveniently, like the emoji keyboard on iPhone.


2 Answers

This should get rid of the emoticons, using iconv as suggested by ndoogan.

Some reproducible data:

require(twitteR) 
# note that I had to register my twitter credentials first
# here's the method: http://stackoverflow.com/q/9916283/1036500
s <- searchTwitter('#emoticons', cainfo="cacert.pem") 

# convert to data frame
df <- do.call("rbind", lapply(s, as.data.frame))

# inspect, yes there are some odd characters in row five
head(df)

                                                                                                                                                text
1                                                                      ROFLOL: echte #emoticons [humor] http://t.co/0d6fA7RJsY via @tweetsmania  ;-)
2 “@teeLARGE: when tmobile get the iphone in 2 wks im killin everybody w/ emoticons &amp; \nall the other stuff i cant see on android!" \n#Emoticons
3                      E poi ricevi dei messaggi del genere da tua mamma xD #crazymum #iloveyou #emoticons #aiutooo #bestlike http://t.co/Yee1LB9ZQa
4                                                #emoticons I want to change my name to an #emoticon. Is it too soon? #prince http://t.co/AgmR5Lnhrk
5  I use emoticons too much. #addicted #admittingit #emoticons <ed><U+00A0><U+00BD><ed><U+00B8><U+00AC><ed><U+00A0><U+00BD><ed><U+00B8><U+0081> haha
6                                                                                         What you text What I see #Emoticons http://t.co/BKowBSLJ0s

Here's the key line that will remove the emoticons:

# Clean text to remove odd characters
df$text <- sapply(df$text,function(row) iconv(row, "latin1", "ASCII", sub=""))

Now inspect again, to see if the odd characters are gone (see row 5)

head(df)    
                                                                                                                               text
1                                                                     ROFLOL: echte #emoticons [humor] http://t.co/0d6fA7RJsY via @tweetsmania  ;-)
2 @teeLARGE: when tmobile get the iphone in 2 wks im killin everybody w/ emoticons &amp; \nall the other stuff i cant see on android!" \n#Emoticons
3                     E poi ricevi dei messaggi del genere da tua mamma xD #crazymum #iloveyou #emoticons #aiutooo #bestlike http://t.co/Yee1LB9ZQa
4                                               #emoticons I want to change my name to an #emoticon. Is it too soon? #prince http://t.co/AgmR5Lnhrk
5                                                                                 I use emoticons too much. #addicted #admittingit #emoticons  haha
6                                                                                        What you text What I see #Emoticons http://t.co/BKowBSLJ0s
like image 162
Ben Avatar answered Jan 08 '23 10:01

Ben


I recommend the function:
ji_replace_all <- function (string, replacement)

From the package:
install_github (" hadley / emo ").

I needed to remove the emojis from tweets that were in the Spanish language. Tried several options, but some messed up the text for me. However this is a marvel that works perfectly:

library(emo)

text="#VIDEO 😢💔🙏🏻,Alguien sabe si en Afganistán hay cigarro?"

ji_replace_all(text,"")

Result:

"#VIDEO ,Alguien sabe si en Afganistán hay cigarro?"

like image 32
Jose Galarza Avatar answered Jan 08 '23 12:01

Jose Galarza