Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to save fasttext model in vec format?

I trained my unsupervised model using fasttext.train_unsupervised() function in python. I want to save it as vec file since I will use this file for pretrainedVectors parameter in fasttext.train_supervised() function. pretrainedVectors only accepts vec file but I am having troubles to creating this vec file. Can someone help me?

Ps. I am able to save it in bin format. It would be also helpful if you suggest me a way to convert bin file to vec file.

like image 457
esin ildiz Avatar asked Oct 11 '19 08:10

esin ildiz


2 Answers

To obtain VEC file, containing merely all words vectors, I took inspiration from bin_to_vec official example.

from fasttext import load_model

# original BIN model loading
f = load_model(YOUR-BIN-MODEL-PATH)
    lines=[]

# get all words from model
words = f.get_words()

with open(YOUR-VEC-FILE-PATH,'w') as file_out:
    
    # the first line must contain number of total words and vector dimension
    file_out.write(str(len(words)) + " " + str(f.get_dimension()) + "\n")

    # line by line, you append vectors to VEC file
    for w in words:
        v = f.get_word_vector(w)
        vstr = ""
        for vi in v:
            vstr += " " + str(vi)
        try:
            file_out.write(w + vstr+'\n')
        except:
            pass

The obtained VEC file can be big. To reduce file size, you can adjust the format of vector components.

If you want to keep only 4 decimal digits, you can replace vstr += " " + str(vi) with
vstr += " " + "{:.4f}".format(vi)

like image 57
Stefano Fiorucci - anakin87 Avatar answered Oct 23 '22 09:10

Stefano Fiorucci - anakin87


you should add words num and dimension at first line of your vec file, than use -preTrainedVectors para

like image 33
darwin007 Avatar answered Oct 23 '22 10:10

darwin007