How do you save a model, dictionary and corpus to disk in Gensim, and then load them again?

Question

In Gensim's documentation, it says:

You can save trained models to disk and later load them back, either to continue training on new training documents or to transform new documents.

I would like to do this with a dictionary, corpus and tf.idf model. However, the documentation seems to say that it is possible, without explaining how to save these things and load them back up again.

How do you do this?

I've been using Pickle, but don't know if this is right...

import pickle
pickle.dump(tfidf, open("tfidf.p", "wb"))
tfidf_reloaded = pickle.load(open("tfidf.p", "rb"))

gojomo · Accepted Answer

In general, you can save things with generic Python pickle, but most gensim models support their own native .save() method.

It takes a target filesystem path, and will save the model more efficiently than pickle() – often by placing large component arrays in separate files, alongside the main file. (When you later move the saved model, keep all these files with the same root name together.)

In particular, some models which have multi-gigabyte subcomponents may not save at all with pickle() – but gensim's native .save() will work.

Models saved with .save() can typically be loaded by using the appropriate class's .load() method. (For example if you've saved a instance of gensim.corpora.dictionary.Dictionary, you'd load it with gensim.corpora.dictionary.Dictionary.load(filepath).

BHA Bilel · Answer

Saving the Dict and Corpus to disk

dictionary.save(DICT_PATH)
corpora.MmCorpus.serialize(CORPUS_PATH, corpus)

Loading the Dict and Corpus from disk

loaded_dict = corpora.Dictionary.load(DICT_PATH)
loaded_corp = corpora.MmCorpus(CORPUS_PATH)

How do you save a model, dictionary and corpus to disk in Gensim, and then load them again?

Tags:

python

nlp

gensim

Data

2 Answers

gojomo

BHA Bilel

Recent Activity

Donate For Us

How do you save a model, dictionary and corpus to disk in Gensim, and then load them again?

Tags:

python

nlp

gensim

Data

2 Answers

gojomo

BHA Bilel

Related questions

Recent Activity

Donate For Us