Vector Space Model: Cosine Similarity vs Euclidean Distance

Tags:

I have corpora of classified text. From these I create vectors. Each vector corresponds to one document. Vector components are word weights in this document computed as TFIDF values. Next I build a model in which every class is presented by a single vector. Model has as many vectors as there classes in the corpora. Component of a model vector is computed as mean of all component values taken from vectors in this class. For unclassified vectors I determine similarity with a model vector by computing cosine between these vectors.

Questions:

1) Can I use Euclidean Distance between unclassified and model vector to compute their similarity?

2) Why Euclidean distance can not be used as similarity measure instead of cosine of angle between two vectors and vice versa?

Thanks!

586

asked Oct 16 '13 17:10

Anton Ashanin

1 Answers

One informal but rather intuitive way to think about this is to consider the 2 components of a vector: direction and magnitude.

Direction is the "preference" / "style" / "sentiment" / "latent variable" of the vector, while the magnitude is how strong it is towards that direction.

When classifying documents we'd like to categorize them by their overall sentiment, so we use the angular distance.

Euclidean distance is susceptible to documents being clustered by their L2-norm (magnitude, in the 2 dimensional case) instead of direction. I.e. vectors with quite different directions would be clustered because their distances from origin are similar.

115

answered Sep 16 '22 13:09

kizzx2

Related questions
                            
                                How do I find an element position in std::vector?
                            
                                Differences between vector, set, and tuple
                            
                                How can I append to a vector in Octave?
                            
                                Efficient way of reading a file into an std::vector<char>?
                            
                                How can I build a std::vector<std::string> and then sort them?
                            
                                Add same value multiple times to std::vector (repeat)
                            
                                Julia: append to an empty vector
                            
                                How to return 5 topmost values from vector in R?
                            
                                Is this behavior of vector::resize(size_type n) under C++11 and Boost.Container correct?
                            
                                Vector vs Collections.synchronizedList(ArrayList)
                            
                                why is c++ std::max_element so slow?
                            
                                How to multiply two vector and get a matrix?
                            
                                How do I add elements to an empty vector in a loop?
                            
                                Difference between std::remove and erase for vector?
                            
                                adding elements of a vector to an unordered set
                            
                                std::vector to string with custom delimiter
                            
                                Compute Median of Values Stored In Vector - C++?
                            
                                Difference between vector::begin() and std::begin()
                            
                                Reorder vector using a vector of indices
                            
                                Hashing 2D, 3D and nD vectors

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Vector Space Model: Cosine Similarity vs Euclidean Distance

Tags:

vector

euclidean-distance

trigonometry

distance

Anton Ashanin

People also ask

1 Answers

kizzx2

Recent Activity

Donate For Us