Text classification in python - (NLTK Sentence based)

Tags:

I need to classify text and i am using Text blob python module to achieve it.I can use either Naive Bayes classifier/Decision tree. I am concern about the below mentioned points.

1) I Need to classify sentences as argument/ Not an argument. I am using two classifiers and training the model using apt data sets. My question is all about do i need to train the model with only keywords ? or i can train the data set with all possible argument and non argument sample sentences? Which would be the best approach in terms of text classification accuracy and time to retrieve?

2) Since the classification would be either argument/not an argument, which classifier would fetch exact results? It is Naive Bayes /Decision tree/Positive Naive bayes?

Thanks in advance.

252

asked Apr 20 '14 04:04

sreram

1 Answers

Ideally, it is said that the more you train your data, the 'better' your results are but it really depends after you've tested it and compared it to the real results you've prepared.

So to answer your question, training the model with keywords may give you too broad results that may not be arguments. But really, you have to compare it to something, so I suggest you might want to also train your model with some sentence structure that arguments seem to follow (a pattern of some sort), it might eliminate the ones that are not arguments. Again, do this and then test it to see if you get higher accuracy than the previous model.

To answer your next question: Which would be the best approach in terms of text classification accuracy and time to retrieve? It really depends on the data your using, I can't really answer this question because you have to perform cross-validation to see if your model achieves high accuracy. Obviously, the more features you are looking, the poorer your learning algorithm's performance. And if you are dealing with gigabytes of text to analyze, I suggest using Mapreduce to perform this job.

You might want to check out SVMs as your learning model, test it out with the learning models (naive bayes, positive naive bayes and decision trees) and see which one performs better.

Hope this helps.

answered Sep 23 '22 04:09

macmania314

Related questions
                            
                                scikit-learn undersampling of unbalanced data for crossvalidation
                            
                                Calling a subprocess in python with environmental variables
                            
                                Matplotlib set minor ticks by default "ON"
                            
                                Doubled escape character
                            
                                CSV dialect in pandas DataFrame to_csv (python)
                            
                                Commiting a transaction from a PostgreSQL function in flask
                            
                                App Engine Socket API factor 8 slower than native python
                            
                                Admob Support for Kivy (Python for Android)
                            
                                Generator using item[n-1] + item[n] memory
                            
                                OS X UDP send error: 55 No buffer space available
                            
                                Why does flask-admin require a ListField of mongoengine to have a field type?
                            
                                __init__() got an unexpected keyword argument 'text'
                            
                                Programmatically check if Python dependencies are satisfied
                            
                                Numpy operations appear slow
                            
                                Python bigquery - ImportError: No module named google.apputils
                            
                                Flask-login request_loader not working?
                            
                                Need explanation on flask.request
                            
                                Multiple Django Storage Backend systems
                            
                                Insufficient Privileges: plone.app.multilingual [1.x] - Translate an Archetype content
                            
                                Debugging strategy for a bug (apparently) affected by timing

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Text classification in python - (NLTK Sentence based)

Tags:

python

python-3.x

machine-learning

classification

bayesian

sreram

People also ask

1 Answers

macmania314

Recent Activity

Donate For Us