In Gensim's documentation, it says:
You can save trained models to disk and later load them back, either to continue training on new training documents or to transform new documents.
I would like to do this with a dictionary, corpus and tf.idf model. However, the documentation seems to say that it is possible, without explaining how to save these things and load them back up again.
How do you do this?
I've been using Pickle, but don't know if this is right...
import pickle
pickle.dump(tfidf, open("tfidf.p", "wb"))
tfidf_reloaded = pickle.load(open("tfidf.p", "rb"))
In general, you can save things with generic Python pickle
, but most gensim
models support their own native .save()
method.
It takes a target filesystem path, and will save the model more efficiently than pickle()
– often by placing large component arrays in separate files, alongside the main file. (When you later move the saved model, keep all these files with the same root name together.)
In particular, some models which have multi-gigabyte subcomponents may not save at all with pickle()
– but gensim
's native .save()
will work.
Models saved with .save()
can typically be loaded by using the appropriate class's .load()
method. (For example if you've saved a instance of gensim.corpora.dictionary.Dictionary
, you'd load it with gensim.corpora.dictionary.Dictionary.load(filepath)
.
Saving the Dict and Corpus to disk
dictionary.save(DICT_PATH)
corpora.MmCorpus.serialize(CORPUS_PATH, corpus)
Loading the Dict and Corpus from disk
loaded_dict = corpora.Dictionary.load(DICT_PATH)
loaded_corp = corpora.MmCorpus(CORPUS_PATH)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With