I am new in sentiment analysis, and totally have no idea on how to go about it using R. Hence, I would like to seek help and guidance in this.
I have a set of data consisting of opinions, and would like to analyse the the opinions.
Title Date Content
Boy May 13 2015 "She is pretty", Tom said.
Animal June 14 2015 The penguin is cute, lion added.
Human March 09 2015 Mr Koh predicted that every human is smart..
Monster Jan 22 2015 Ms May, a student, said that John has $10.80.
Thank you.
Sentiment analysis encompasses a broad category of methods designed to measure positive versus negative sentiment from text, so that makes this a fairly difficult question to answer simply. But here is a simple answer: You can apply a dictionary to your document-term matrix and then combine the positive versus negative key categories of your dictionary to create a sentiment measure.
I suggest trying this in the text analysis package quanteda, which handles a variety of existing dictionary formats and allows you to create very flexible custom dictionaries.
For example:
require(quanteda)
mycorpus <- subset(inaugCorpus, Year>1980)
mydict <- dictionary(list(negative = c("detriment*", "bad*", "awful*", "terrib*", "horribl*"),
postive = c("good", "great", "super*", "excellent")))
myDfm <- dfm(mycorpus, dictionary = mydict)
## Creating a dfm from a corpus ...
## ... lowercasing
## ... tokenizing
## ... indexing documents: 9 documents
## ... indexing features: 3,113 feature types
## ... applying a dictionary consisting of 2 keys
## ... created a 9 x 2 sparse dfm
## ... complete.
## Elapsed time: 0.057 seconds.
myDfm
## Document-feature matrix of: 9 documents, 2 features.
## 9 x 2 sparse Matrix of class "dfmSparse"
## features
## docs negative postive
## 1981-Reagan 0 6
## 1985-Reagan 0 6
## 1989-Bush 0 18
## 1993-Clinton 1 2
## 1997-Clinton 2 8
## 2001-Bush 1 6
## 2005-Bush 0 8
## 2009-Obama 2 3
## 2013-Obama 1 3
# use a LIWC dictionary - obviously you need this file
liwcdict <- dictionary(file = "LIWC2001_English.dic", format = "LIWC")
myDfmLIWC <- dfm(mycorpus, dictionary = liwcdict)
## Creating a dfm from a corpus ...
## ... lowercasing
## ... tokenizing
## ... indexing documents: 9 documents
## ... indexing features: 3,113 feature types
## ... applying a dictionary consisting of 68 keys
## ... created a 9 x 68 sparse dfm
## ... complete.
## Elapsed time: 1.844 seconds.
myDfmLIWC[, grep("^Pos|^Neg", features(myDfmLIWC))]
## Document-feature matrix of: 9 documents, 4 features.
## 9 x 4 sparse Matrix of class "dfmSparse"
## features
## docs Negate Posemo Posfeel Negemo
## 1981-Reagan 46 89 5 24
## 1985-Reagan 28 104 7 33
## 1989-Bush 40 102 10 8
## 1993-Clinton 25 51 3 23
## 1997-Clinton 27 64 5 22
## 2001-Bush 40 80 6 27
## 2005-Bush 25 117 5 31
## 2009-Obama 40 83 5 46
## 2013-Obama 42 80 13 22
For your corpus, assuming that you get it into a data.frame called data
, you can create a quanteda corpus using:
mycorpus <- corpus(data$Content, docvars = data[, 1:2])
See also ?textfile
for loading in content from files in one easy command. This works with .csv files for instance, although you would have problems with that file because the Content field contains text containing commas.
There are many other ways to measure sentiment of course, but if you are new to sentiment mining and R, that should get you started. You can read more on sentiment mining methods (and apologies if you already have encountered them) from:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With