So, I'm trying to do text multiclass classification. I have been reading a lot of old questions and blog posts, but I still can't fully understand the concept of that.
I tried some example from this blog post as well. http://www.laurentluce.com/posts/twitter-sentiment-analysis-using-python-and-nltk/
But when it comes to multiclass classification I don't quite understand that. Let's say I want to classify text into multi languages, French, English, Italian and German. And I want to use NaviesBayes which I think it would be the easiest to start with. From what I have read in the old questions, the simplest solution would be to use one vs all. So, each language will have its own model. So, I would have 3 models for French, English and Italian. Then I would run a text against every model and check if which one has the highest probability. Am I correct?
But when it comes to coding, in the example above he has tweets like this which will be classified either positive or negative.
pos_tweets = [('I love this car', 'positive'),
('This view is amazing', 'positive'),
('I feel great this morning', 'positive'),
('I am so excited about tonight\'s concert', 'positive'),
('He is my best friend', 'positive')]
neg_tweets = [('I do not like this car', 'negative'),
('This view is horrible', 'negative'),
('I feel tired this morning', 'negative'),
('I am not looking forward to tonight\'s concert', 'negative'),
('He is my enemy', 'negative')]
Which it's positive or negative. So, when it comes to train one model for French how should I tag the text? Would it be like this? So this would be the positive?
[('Bon jour', 'French'),
'je m'appelle', 'French']
And the negative would be
[('Hello', 'English'),
('My name', 'English')]
But would this mean I could just add Italian and German and have just one model for 4 languages? Or I don't really need the negative?
So, the question would be what's the right approach to do multi class classification with ntlk?
There's no need for a one-vs-all scheme with Naive Bayes -- it's a multiclass model out of the box. Just feed a list of (sample, label)
pairs to the classifier learner where label
denotes the language.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With