Online learning of LDA model in Spark

Question

Is there a way to train a LDA model in an online-learning fashion, ie. loading a previously train model, and update it with new documents ?

mathieu · Accepted Answer

Answering myself : it is not possible as of now.

Actually, Spark has 2 implementations for LDA model training, and one is OnlineLDAOptimizer. This approach is especially designed to incrementally update the model with mini batches of documents.

The Optimizer implements the Online variational Bayes LDA algorithm, which processes a subset of the corpus on each iteration, and updates the term-topic distribution adaptively.

Original Online LDA paper: Hoffman, Blei and Bach, "Online Learning for Latent Dirichlet Allocation." NIPS, 2010.

Unfortunately, the current mllib API does not allow to load a previously trained LDA model, and add a batch to it.

Some mllib models support an initialModel as starting point for incremental updates (see KMeans, or GMM), but LDA does not currently support that. I filled a JIRA for it : SPARK-20082. Please upvote ;-)

For the record, there's also a JIRA for streaming LDA SPARK-8696

ML_TN · Answer

I don't think that such a thing would exist. LDA is probabilistic parameter estimation algorithm ( a very simplified explanation of the process here LDA explained), and adding a document or a few would change all previously computed probabilities, so literally recompute the model.

I don't know about your use case, but you can think about doing an update by batch if your model converges in a reasonable time and discard some of the oldest document at each re-computation to make the estimation faster.

Online learning of LDA model in Spark

Tags:

machine-learning

apache-spark

apache-spark-ml

apache-spark-mllib

lda

mathieu

2 Answers

mathieu

ML_TN

Recent Activity

Donate For Us

Online learning of LDA model in Spark

Tags:

machine-learning

apache-spark

apache-spark-ml

apache-spark-mllib

lda

mathieu

2 Answers

mathieu

ML_TN

Related questions

Recent Activity

Donate For Us