Why are Cosine Similarity and TF-IDF used together?

1 Answers

TF-IDF is the weighting used.

Cosine is the measure used.

You could use cosine without weighting, but results then usually are worse. Jaccard works on sets - it's not obvious how to use weights without turning it into something else without making it the same as Cosine.

175

answered Nov 15 '22 07:11

Has QUIT--Anony-Mousse

Related questions
                            
                                What's the prediction algorithm behind websites like farecast.com (bing travel)?
                            
                                Java+Redis vs plain Java efficiency for data intensive applications?
                            
                                How do I make a randomForest model size smaller?
                            
                                Lift value calculation
                            
                                Using machine learning to predict the collapse & stabilization of complex systems?
                            
                                Weka always producing same clusters for different data
                            
                                How to detect and delete noise in rapidminer?
                            
                                Good algorithm to find themes in tweets ranked by follower counts?
                            
                                Is there a well-designed, maintained decision tree learning library for Java?
                            
                                Length normalization in a naive Bayes classifier for documents
                            
                                Techniques for calculating adjective frequency [closed]
                            
                                A good blobstore / memcache solution
                            
                                What does the items in one bracket reperesent in sequential pattern mining
                            
                                Search twitter and obtain tweets by hashtag, maximizing number of returned search results
                            
                                Weights argument in R gbm function
                            
                                scikit-learn: clustering text documents using DBSCAN

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why are Cosine Similarity and TF-IDF used together?

Tags:

text-mining

data-mining

cosine-similarity

tf-idf

linguistics

Evgenij Reznik

People also ask

1 Answers

Has QUIT--Anony-Mousse

Recent Activity

Donate For Us