Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

TextBlob NaiveBayesAnalyzer extremely slow (compared to Pattern)

I'm using TextBlob for python to do some sentiment analysis on tweets. The default analyzer in TextBlob is the PatternAnalyzer which works resonably well and is appreciably fast.

sent = TextBlob(tweet.decode('utf-8')).sentiment

I have now tried to switch to the NaiveBayesAnalyzer and found the runtime to be impractical for my needs. (Approaching 5 seconds per tweet.)

sent = TextBlob(tweet.decode('utf-8'), analyzer=NaiveBayesAnalyzer()).sentiment

I have used the scikit learn implementation of the Naive Bayes Classifier before and did not find it to be this slow, so I'm wondering if I'm using it right in this case.

I am assuming the analyzer is pretrained, at least the documentation states "Naive Bayes analyzer that is trained on a dataset of movie reviews." But then it also has a function train() which is described as "Train the Naive Bayes classifier on the movie review corpus." Does it internally train the analyzer before each run? I hope not.

Does anyone know of a way to speed this up?

like image 514
Matt M. Avatar asked Oct 20 '15 16:10

Matt M.


1 Answers

Yes, Textblob will train the analyzer before each run. You can use following code to avoid train the analyzer everytime.

from textblob import Blobber
from textblob.sentiments import NaiveBayesAnalyzer
tb = Blobber(analyzer=NaiveBayesAnalyzer())

print tb("sentence you want to test")
like image 90
Alan Avatar answered Oct 16 '22 01:10

Alan