I am looking for difference between LDA and NTM . What are some use case where you will use LDA over NTM?
As per AWS doc:
LDA : The Amazon SageMaker Latent Dirichlet Allocation (LDA) algorithm is an unsupervised learning algorithm that attempts to describe a set of observations as a mixture of distinct categories. LDA is most commonly used to discover a user-specified number of topics shared by documents within a text corpus.
Although you can use both the Amazon SageMaker NTM and LDA algorithms for topic modeling, they are distinct algorithms and can be expected to produce different results on the same input data.
LDA and NTM have different scientific logic:
SageMaker LDA (Latent Dirichlet Allocation, not to be confused with Linear Discriminant Analysis) model works by assuming that documents are formed by sampling words from a finite set of topics. It is made of 2 moving parts: (1) the word composition per topic and (2) the topic composition per document
SageMaker NTM on the other hand doesn't explicitly learn a word distribution per topic, it is a neural network that passes document through a bottleneck layer and tries to reproduce the input document (presumably a Variational Auto Encoder (VAE) according to AWS documentation). That means that the bottleneck layer ends up containing all necessary information to predict document composition and its coefficients can be considered as topics
Here are considerations for choosing one or the other:
ml.c4.xlarge
instances. SageMaker LDA currently only support single-instance CPU training.If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With