Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to vectorize whole text using fasttext?

To get vector of a word, I can use:

model["word"]

but if I want to get the vector of a sentence, I need to either sum vectors of all words or get average of all vectors.

Does FastText provide a method to do this?

like image 385
Andrey Avatar asked Apr 17 '17 16:04

Andrey


People also ask

Is FastText word embedding?

fastText is another word embedding method that is an extension of the word2vec model. Instead of learning vectors for words directly, fastText represents each word as an n-gram of characters.

Is FastText better than glove?

The biggest benefit of using FastText is that it generate better word embeddings for rare words, or even words not seen during training because the n-gram character vectors are shared with other words. This is something that Word2Vec and GLOVE cannot achieve.

What can FastText do?

FastText is an open-source, free, lightweight library that allows users to learn text representations and text classifiers. It works on standard, generic hardware.

What is FastText format?

FastText is an open-source, free library from Facebook AI Research(FAIR) for learning word embeddings and word classifications. This model allows creating unsupervised learning or supervised learning algorithm for obtaining vector representations for words.


1 Answers

If you want to compute vector representations of sentences or paragraphs, please use:

$ ./fasttext print-sentence-vectors model.bin < text.txt

This assumes that the text.txt file contains the paragraphs that you want to get vectors for. The program will output one vector representation per line in the file.

This has been clearly mentioned in the README of fasttext repo. https://github.com/facebookresearch/fastText

like image 167
Aanchal1103 Avatar answered Sep 21 '22 18:09

Aanchal1103