Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to evaluate Word2Vec model

Hi have my own corpus and I train several Word2Vec models on it. What is the best way to evaluate them one against each-other and choose the best one? (Not manually obviously - I am looking for various measures).

It worth noting that the embedding is for items and not word, therefore I can't use any existing benchmarks.

Thanks!

like image 271
oren_isp Avatar asked Oct 04 '18 11:10

oren_isp


People also ask

How do you evaluate an embedded word?

Word embeddings are widely used nowadays in Distributional Semantics and for a variety of tasks in NLP. Embeddings can be evaluated using ex- trinsic evaluation methods, i.e. the trained em- beddings are evaluated on a specific task such as part-of-speech tagging or named-entity recogni- tion (Schnabel et al., 2015).

How does Word2Vec measure similarity?

Therefore, Word2Vec can capture the similarity value between words from the training of a large corpus. The resulting similarity value is obtained from the word vector value than calculated using the Cosine Similarity equation.

How accurate is Word2Vec?

According to the results of tests of the accuracy of the three word embedding, FastText outperforms Glove and Word2vec for the dataset of 20 newsgroups, the accuracy is 97.2% for FastText, 95.8% for Glove and 92.5% for Word2Vec.

How do I use Word2Vec model?

Preprocess/clean the text data, using NLTK. Use word2vec to create word and title embeddings, then visualize them as clusters using t-SNE. Visualize the relationship between title sentiment and article popularity. Attempt to predict article popularity from the embeddings and other available features.


1 Answers

There's no generic way to assess token-vector quality, if you're not even using real words against which other tasks (like the popular analogy-solving) can be tried.

If you have a custom ultimate task, you have to devise your own repeatable scoring method. That will likely either be some subset of your actual final task, or well-correlated with that ultimate task. Essentially, whatever ad-hoc method you may be using the 'eyeball' the results for sanity should be systematized, saving your judgements from each evaluation, so that they can be run repeatedly against iterative model improvements.

(I'd need more info about your data/items and ultimate goals to make further suggestions.)

like image 116
gojomo Avatar answered Sep 20 '22 16:09

gojomo