Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sentiment Analysis in R

I am new in sentiment analysis, and totally have no idea on how to go about it using R. Hence, I would like to seek help and guidance in this.

I have a set of data consisting of opinions, and would like to analyse the the opinions.

Title      Date            Content    
Boy        May 13 2015     "She is pretty", Tom said. 
Animal     June 14 2015    The penguin is cute, lion added.
Human      March 09 2015   Mr Koh predicted that every human is smart..
Monster    Jan 22 2015     Ms May, a student, said that John has $10.80. 

Thank you.

like image 750
poppp Avatar asked Sep 16 '15 02:09

poppp


1 Answers

Sentiment analysis encompasses a broad category of methods designed to measure positive versus negative sentiment from text, so that makes this a fairly difficult question to answer simply. But here is a simple answer: You can apply a dictionary to your document-term matrix and then combine the positive versus negative key categories of your dictionary to create a sentiment measure.

I suggest trying this in the text analysis package quanteda, which handles a variety of existing dictionary formats and allows you to create very flexible custom dictionaries.

For example:

require(quanteda)
mycorpus <- subset(inaugCorpus, Year>1980)
mydict <- dictionary(list(negative = c("detriment*", "bad*", "awful*", "terrib*", "horribl*"),
                          postive = c("good", "great", "super*", "excellent")))
myDfm <- dfm(mycorpus, dictionary = mydict)
## Creating a dfm from a corpus ...
##    ... lowercasing
##    ... tokenizing
##    ... indexing documents: 9 documents
##    ... indexing features: 3,113 feature types
##    ... applying a dictionary consisting of 2 keys
##    ... created a 9 x 2 sparse dfm
##    ... complete. 
## Elapsed time: 0.057 seconds.
myDfm
## Document-feature matrix of: 9 documents, 2 features.
## 9 x 2 sparse Matrix of class "dfmSparse"
##               features
## docs           negative postive
##   1981-Reagan         0       6
##   1985-Reagan         0       6
##   1989-Bush           0      18
##   1993-Clinton        1       2
##   1997-Clinton        2       8
##   2001-Bush           1       6
##   2005-Bush           0       8
##   2009-Obama          2       3
##   2013-Obama          1       3

# use a LIWC dictionary - obviously you need this file
liwcdict <- dictionary(file = "LIWC2001_English.dic", format = "LIWC")
myDfmLIWC <- dfm(mycorpus, dictionary = liwcdict)
## Creating a dfm from a corpus ...
##    ... lowercasing
##    ... tokenizing
##    ... indexing documents: 9 documents
##    ... indexing features: 3,113 feature types
##    ... applying a dictionary consisting of 68 keys
##    ... created a 9 x 68 sparse dfm
##    ... complete. 
## Elapsed time: 1.844 seconds.
myDfmLIWC[, grep("^Pos|^Neg", features(myDfmLIWC))]
## Document-feature matrix of: 9 documents, 4 features.
## 9 x 4 sparse Matrix of class "dfmSparse"
##               features
## docs           Negate Posemo Posfeel Negemo
##   1981-Reagan      46     89       5     24
##   1985-Reagan      28    104       7     33
##   1989-Bush        40    102      10      8
##   1993-Clinton     25     51       3     23
##   1997-Clinton     27     64       5     22
##   2001-Bush        40     80       6     27
##   2005-Bush        25    117       5     31
##   2009-Obama       40     83       5     46
##   2013-Obama       42     80      13     22

For your corpus, assuming that you get it into a data.frame called data, you can create a quanteda corpus using:

mycorpus <- corpus(data$Content, docvars = data[, 1:2])

See also ?textfile for loading in content from files in one easy command. This works with .csv files for instance, although you would have problems with that file because the Content field contains text containing commas.

There are many other ways to measure sentiment of course, but if you are new to sentiment mining and R, that should get you started. You can read more on sentiment mining methods (and apologies if you already have encountered them) from:

like image 179
Ken Benoit Avatar answered Nov 14 '22 22:11

Ken Benoit