Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sentimental Analysis of review comments using qdap is slow

Am using qdap package to determine the sentiment of each review comment of a particular application. I read the review comments from a CSV file and pass it to the polarity function of qdap. Everything works fine and I get the polarity for all the review comments but the problem is that it takes 7-8 seconds to calculate the polarity all the sentences (total number of sentences present in the CSV file is 779). I am pasting my code below.

  temp_csv <- filePath()
  attach(temp_csv)
  text_data <- temp_csv[,c('Content')]
  print(Sys.time())
  polterms <- list(neg=c('wtf'))
  POLKEY <- sentiment_frame(positives=c(positive.words),negatives=c(polterms[[1]],negative.words))     
  polarity <- polarity(sentences, polarity.frame = POLKEY) 
  print(Sys.time())

Time taken is as follows:

[1] "2016-04-12 16:43:01 IST"

[1] "2016-04-12 16:43:09 IST"

Can somebody let me know if I am doing something wrong? How can I improve the performance?

like image 367
VenuSathya20 Avatar asked Apr 12 '16 12:04

VenuSathya20


People also ask

Can we use sentiment analysis to analyze Amazon reviews?

In this study, I will analyze the Amazon reviews. The reviews are unstructured. In other words, the text is unorganized. Sentiment analysis, however, helps us make sense of all this unstructured text by automatically tagging it. Sentiment analysis helps us to process huge amounts of data in an efficient and cost-effective way.

How do I perform aspect-based sentiment analysis on product reviews?

This App Review Analysis Template will perform aspect-based sentiment analysis and keyword extraction on your product reviews. Here’s how it works: 1. Choose the App Review Analysis Template Choose the App Review Analysis template to create your aspect-based sentiment analysis workflow.

Can I build my own sentiment analysis model?

This user-friendly platform enables you to build your own sentiment analysis model without needing to know how to code or have experience in machine learning. This App Review Analysis Template will perform aspect-based sentiment analysis and keyword extraction on your product reviews. Here’s how it works: 1.

Are tweets hard to score for sentiment analysis algorithms?

These tweets are hard to score for sentiment analysis algorithms. It is not surprising that they have the most positive score (polarity =1). In order to understand how the data is shaped and how the sentiment analysis works, let’s examine more reviews with different criteria (Table 4 and 5).


1 Answers

I am the author of qdap. The polarity function was designed for much smaller data sets. As my role shifted I began to work with larger data sets. I needed fast and accurate (these two things are in opposition to each other) and have since developed a break away package sentimentr. The algorithm is optimized to be faster and more accurate than qdap's polarity.

As it stands now you have 5 dictionary based (or trained alorithm based) approached to sentiment detection. Each has it's drawbacks (-) and pluses (+) and is useful in certain circumstances.

  1. qdap +on CRAN; -slow
  2. syuzhet +on CRAN; +fast; +great plotting; -less accurate on non-literature use
  3. sentimentr +fast; +higher accuracy; -GitHub only
  4. stansent (stanford port) +most accurate; -slower
  5. tm.plugin.sentiment -archived on CRAN; -I couldn't get it working easily

I show time tests on sample data for the first 4 choices from above in the code below.

Install packages and make timing functions

I use pacman because it allows the reader to just run the code; though you can replace with install.packages & library calls.

if (!require("pacman")) install.packages("pacman")
pacman::p_load(qdap, syuzhet, dplyr)
pacman::p_load_current_gh(c("trinker/stansent", "trinker/sentimentr"))

pres_debates2012 #nrow = 2912

tic <- function (pos = 1, envir = as.environment(pos)){
    assign(".tic", Sys.time(), pos = pos, envir = envir)
    Sys.time()
}

toc <- function (pos = 1, envir = as.environment(pos)) {
    difftime(Sys.time(), get(".tic", , pos = pos, envir = envir))
}

id <- 1:2912

Timings

## qdap
tic()
qdap_sent <- pres_debates2012 %>%
    with(qdap::polarity(dialogue, id))
toc() # Time difference of 18.14443 secs


## sentimentr
tic()
sentimentr_sent <- pres_debates2012 %>%
    with(sentiment(dialogue, id))
toc() # Time difference of 1.705685 secs


## syuzhet
tic()
syuzhet_sent <- pres_debates2012 %>%
    with(get_sentiment(dialogue, method="bing"))
toc() # Time difference of 1.183647 secs


## stanford
tic()
stanford_sent <- pres_debates2012 %>%
    with(sentiment_stanford(dialogue))
toc() # Time difference of 6.724482 mins

For more on timings and accuracy see my sentimentr README.md and please star the repo if it's useful. The viz below captures one of the tests from the README:

enter image description here

like image 80
Tyler Rinker Avatar answered Sep 28 '22 08:09

Tyler Rinker