Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it possible to fine tune FastText models

I'm working on a project for text similarity using FastText, the basic example I have found to train a model is:

from gensim.models import FastText

model = FastText(tokens, size=100, window=3, min_count=1, iter=10, sorted_vocab=1)

As I understand it, since I'm specifying the vector and ngram size, the model is been trained from scratch here and if the dataset is small I would spect great resutls.

The other option I have found is to load the original Wikipedia model which is a huge file:

from gensim.models.wrappers import FastText

model = FastText.load_fasttext_format('wiki.simple')

My question is, can I load the Wikipedia or any other model, and fine tune it with my dataset?

like image 947
Luis Ramon Ramirez Rodriguez Avatar asked Oct 23 '25 23:10

Luis Ramon Ramirez Rodriguez


1 Answers

If you have a labelled dataset, then you should be able to fine-tune to it. This GitHub issue explains that you want to use the pretrainedVectors option. You would start with the Wikipedia pretrained vectors, then train on your dataset. It seems that gensim can do this, but according to this GH issue, there has been some bugs.

like image 166
Sam H. Avatar answered Oct 26 '25 11:10

Sam H.



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!