Hi have my own corpus and I train several Word2Vec models on it. What is the best way to evaluate them one against each-other and choose the best one? (Not manually obviously - I am looking for various measures).
It worth noting that the embedding is for items and not word, therefore I can't use any existing benchmarks.
Thanks!
Word embeddings are widely used nowadays in Distributional Semantics and for a variety of tasks in NLP. Embeddings can be evaluated using ex- trinsic evaluation methods, i.e. the trained em- beddings are evaluated on a specific task such as part-of-speech tagging or named-entity recogni- tion (Schnabel et al., 2015).
Therefore, Word2Vec can capture the similarity value between words from the training of a large corpus. The resulting similarity value is obtained from the word vector value than calculated using the Cosine Similarity equation.
According to the results of tests of the accuracy of the three word embedding, FastText outperforms Glove and Word2vec for the dataset of 20 newsgroups, the accuracy is 97.2% for FastText, 95.8% for Glove and 92.5% for Word2Vec.
Preprocess/clean the text data, using NLTK. Use word2vec to create word and title embeddings, then visualize them as clusters using t-SNE. Visualize the relationship between title sentiment and article popularity. Attempt to predict article popularity from the embeddings and other available features.
There's no generic way to assess token-vector quality, if you're not even using real words against which other tasks (like the popular analogy-solving) can be tried.
If you have a custom ultimate task, you have to devise your own repeatable scoring method. That will likely either be some subset of your actual final task, or well-correlated with that ultimate task. Essentially, whatever ad-hoc method you may be using the 'eyeball' the results for sanity should be systematized, saving your judgements from each evaluation, so that they can be run repeatedly against iterative model improvements.
(I'd need more info about your data/items and ultimate goals to make further suggestions.)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With