Can TF/IDF take classes in account

Tags:

Using a classsication algorythm (for example naive bayes or SVM), and StringToWordVector, would it be possible to use TF/IDF and to count terms frequency in the whole current class instead of just looking in a single document?

Let me explain, I would like the computation to give high score to words that are very frequent for a given class (not just for a given document) but not very frequent in the whole corpus.

Is it possible out of the box or does this need some extra developments?

Thanks :)

818

asked Oct 11 '13 15:10

Loic

2 Answers

I would like the computation to give high score to words that are very frequent for a given class (not just for a given document) but not very frequent in the whole corpus.

You seem to want supervised term weighting. I'm not aware of any off-the-shelf implementation of that, but there's a host of literature about it. E.g. the weighting scheme tf-χ² replaces idf with the result of a χ² independence test, so terms that statistically depend on certain classes get boosted, and there are several others.

Tf-idf itself is by its very nature unsupervised.

124

answered Oct 05 '22 14:10

Fred Foo

I think you're confusing yourself here---what you're asking for is essentially the feature weight on that term for documents of that class. This is what the learning algorithm is intended to optimise. Just worry about a useful representation of documents, which must necessarily be invariant to the class to which they belong (since you won't know what the class is for unseen test documents).

answered Oct 05 '22 14:10

Ben Allison

Related questions
                            
                                How to get quick documentation working with PyCharm and Pytorch
                            
                                Naive bayes calculation in sql
                            
                                Are there similar datasets to MNIST?
                            
                                Feature mapping using multi-variable polynomial
                            
                                Translating a TensorFlow LSTM into synapticjs
                            
                                doc2vec: How is PV-DBOW implemented
                            
                                Why is it possible to have low loss, but also very low accuracy, in a convolutional neural network?
                            
                                Why is my VotingClassifier accuracy less than my individual classifier?
                            
                                Drawing decision boundaries in R
                            
                                What's the difference between optimizer.compute_gradient() and tf.gradients() in tensorflow?
                            
                                CNN that generate a new image from input image
                            
                                Logistic Regression: How to find top three feature that have highest weights?
                            
                                Tensorflow failed to create a newwriteablefile when retraining inception
                            
                                How to determine an overfitted model based on loss precision and recall
                            
                                XGBoost - n_estimators = 1 equal to single-tree classifier?
                            
                                fastai learner requirements and batch prediction
                            
                                Keras: Making a neural network to find a number's modulus
                            
                                Keras Multitask learning with two different input sample size
                            
                                How to get the nearest neighbor in weka using java
                            
                                The relationship between latent Dirichlet allocation and documents clustering

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Can TF/IDF take classes in account

Tags:

machine-learning

weka

Loic

People also ask

2 Answers

Fred Foo

Ben Allison

Recent Activity

Donate For Us