I trained my unsupervised model using fasttext.train_unsupervised()
function in python. I want to save it as vec file since I will use this file for pretrainedVectors
parameter in fasttext.train_supervised()
function. pretrainedVectors
only accepts vec file but I am having troubles to creating this vec file. Can someone help me?
Ps. I am able to save it in bin format. It would be also helpful if you suggest me a way to convert bin file to vec file.
To obtain VEC file, containing merely all words vectors, I took inspiration from bin_to_vec official example.
from fasttext import load_model
# original BIN model loading
f = load_model(YOUR-BIN-MODEL-PATH)
lines=[]
# get all words from model
words = f.get_words()
with open(YOUR-VEC-FILE-PATH,'w') as file_out:
# the first line must contain number of total words and vector dimension
file_out.write(str(len(words)) + " " + str(f.get_dimension()) + "\n")
# line by line, you append vectors to VEC file
for w in words:
v = f.get_word_vector(w)
vstr = ""
for vi in v:
vstr += " " + str(vi)
try:
file_out.write(w + vstr+'\n')
except:
pass
The obtained VEC file can be big. To reduce file size, you can adjust the format of vector components.
If you want to keep only 4 decimal digits, you can replace vstr += " " + str(vi)
with
vstr += " " + "{:.4f}".format(vi)
you should add words num and dimension at first line of your vec file, than use -preTrainedVectors para
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With