Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fasttext algorithm use only word and subword? or sentences too?

I read the paper and googled as well if there is any good example of the learning method(or more likely learning procedure)

For word2vec, suppose there is corpus sentence

I go to school with lunch box that my mother wrapped every morning

Then with window size 2, it will try to obtain the vector for 'school' by using surrounding words

['go', 'to', 'with', 'lunch']

Now, FastText says that it uses the subword to obtain the vector, so it is definitely use n gram subword, for example with n=3,

['sc', 'sch', 'cho', 'hoo', 'ool', 'school']

Up to here, I understood. But it is not clear that if the other words are being used for learning for 'school'. I can only guess that other surrounding words are used as well like the word2vec, since the paper mentions

=> the terms Wc and Wt are both used in functions

where Wc is context word and Wt is word at sequence t.

However, it is not clear that how FastText learns the vectors for word.

.

.

Please clearly explain how FastText learning process goes in procedure?

.

.

More precisely I want to know that if FastText also follows the same procedure as Word2Vec while it learns the n-gram characterized subword in addition. Or only n-gram characterized subword with word being used?

How does it vectorize the subword at initial? etc

like image 691
Isaac Sim Avatar asked Apr 13 '18 07:04

Isaac Sim


People also ask

Is FastText a word embedding model?

fastText is another word embedding method that is an extension of the word2vec model. Instead of learning vectors for words directly, fastText represents each word as an n-gram of characters.

How does FastText handle out of vocabulary words?

fastText can obtain vectors even for out-of-vocabulary (OOV) words, by summing up vectors for its component char-ngrams, provided at least one of the char-ngrams was present in the training data.

How does FastText embedding work?

fastText embeddings exploit subword information to construct word embeddings. Representations are learnt of character -grams, and words represented as the sum of the -gram vectors. This extends the word2vec type models with subword information. This helps the embeddings understand suffixes and prefixes.

What is FastText algorithm?

FastText is an open-source, free library from Facebook AI Research(FAIR) for learning word embeddings and word classifications. This model allows creating unsupervised learning or supervised learning algorithm for obtaining vector representations for words. It also evaluates these models.


1 Answers

Any context word has its candidate input vector assembled from the combination of both its full-word token and all its character-n-grams. So if the context word is 'school', and you're using 3-4 character n-grams, the in-training input vector is a combination of the full-word vector for school, and all the n-gram vectors for ['sch', 'cho', 'hoo', 'ool', 'scho', 'choo', 'hool'].)

When that candidate vector is adjusted by training, all the constituent vectors are adjusted. (This is a little like how in word2vec CBOW, mode, all the words of the single average context input vector get adjusted together, when their ability to predict a single target output word is evaluated and improved.)

As a result, those n-grams that happen to be meaningful hints across many similar words – for example, common word-roots or prefixes/suffixes – get positioned where they confer that meaning. (Other n-grams may remain mostly low-magnitude noise, because there's little meaningful pattern to where they appear.)

After training, reported vectors for individual in-vocabulary words are also constructed by combining the full-word vector and all n-grams.

Then, when you also encounter an out-of-vocabulary word, to the extent it shares some or many n-grams with morphologically-similar in-training words, it will get a similar calculated vector – and thus be better than nothing, in guessing what that word's vector should be. (And in the case of small typos or slight variants of known words, the synthesized vector may be pretty good.)

like image 171
gojomo Avatar answered Oct 18 '22 07:10

gojomo