Scikit learn - How to use SVM and Random Forest for text classification?

Tags:

I have a set of trainFeatures and a set of testFeatures with positive, neutral and negative labels:

trainFeats = negFeats + posFeats + neutralFeats
testFeats  = negFeats + posFeats + neutralFeats

For example, one entry inside the trainFeats is

(['blue', 'yellow', 'green'], 'POSITIVE')

the same for the list of test features, so I specify the labels for each set. My question is how can I use the scikit implementation of Random Forest classifier and SVM to get the accuracy of this classifier altogether with precision and recall scores for each class? The problem is that I am currently using words as features, while from what I read these classifiers require numbers. Is there a way I can achieve my purpose without changing functionality? Many thanks!

571

asked Feb 23 '14 20:02

Crista23

1 Answers

You can look into this scikit-learn tutorial and especially the section on learning and predicting for how to create and use a classifier. The example uses SVM, however it is simple to use RandomForestClassifier instead as all classifiers implement the fit and predict methods.

When working with text features you can use CountVectorizer or DictVectorizer. Take a look at feature extraction and especially section 4.1.3.

You can find an example for classifying text documents here.

Then you can get the precision and recall of the classifier with the classification report.

152

answered Sep 19 '22 19:09

dnll

Related questions
                            
                                Python: How do I get a list of all keys in a dictionary of dictionaries, at a given depth [closed]
                            
                                Python scipy chisquare returns different values than R chisquare
                            
                                Pandas: iterate over unique values of a column that is already in sorted order
                            
                                How to vectorize finding max value in numpy array with if statement?
                            
                                Create a glowing border in QSS
                            
                                Python optimization through bytecode
                            
                                indexing numpy array with logical operator
                            
                                Drop some Pandas dataframe rows using group based condition
                            
                                Is it possible to access the context object (code block) inside the __exit__() method of a context manager?
                            
                                Combining multiple 1D arrays returned from a function into a 2D array python
                            
                                NUMPY create, fill with random binary data
                            
                                Sending an email from python using smtpd.DebuggingServer as STMP server
                            
                                Pymongo bulk inserts
                            
                                Python multiply list of lists element-wise
                            
                                Classifying text documents with random forests
                            
                                get all parents of xml node using python
                            
                                Writing webapps in python without Django or any framework [closed]
                            
                                How to format with a UNICODE string to JINJA's variable in a template?
                            
                                How to efficiently apply an operator to the cartesian product of two arrays?
                            
                                wxpython - One Frame, Multiple Panels, Modularized Code

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Scikit learn - How to use SVM and Random Forest for text classification?

Tags:

python

machine-learning

classification

scikit-learn

Crista23

People also ask

1 Answers

dnll

Recent Activity

Donate For Us