What is strange is that it seems to be exactly the same code for the fit and for the partial_fit.
You can see the code at the following link :
https://github.com/scikit-learn/scikit-learn/blob/c957249/sklearn/decomposition/online_lda.py#L478
The fit() method takes the training data as arguments, which can be one array in the case of unsupervised learning, or two arrays in the case of supervised learning.
partial_fit is a handy API that can be used to perform incremental learning in a mini-batch of an out-of-memory dataset. The primary purpose of using warm_state is to reducing training time when fitting the same dataset with different sets of hyperparameter values.
Number of Topics: n_components is the number of topics to find from the corpus. The number of maximum iterations: max_iter: It is the number of maximum iterations allowed for the LDA algorithm to converge.
Not exactly the same code; partial_fit
uses total_samples
:
" total_samples : int, optional (default=1e6) Total number of documents. Only used in the partial_fit method."
https://github.com/scikit-learn/scikit-learn/blob/c957249/sklearn/decomposition/online_lda.py#L184
(partial fit) https://github.com/scikit-learn/scikit-learn/blob/c957249/sklearn/decomposition/online_lda.py#L472
(fit) https://github.com/scikit-learn/scikit-learn/blob/c957249/sklearn/decomposition/online_lda.py#L510
Just in case it is of your interest: partial_fit
is a good candidate to be used whenever your dataset is really really big. So, instead of running into possible memory problems you perform your fitting in smaller batches, which is called incremental learning.
So, in your case you should take into account that total_samples
default's value is 1000000.0
. Therefore, if you don't change this number and your real number of samples is bigger then you'll get different results from the fit
method and fit_partial
. Or maybe it could be the case that you are using mini-batches in the fit_partial
and not covering all the samples that you provide to the fit
method. And even if you do this right, you could also get different results, as stated in the documentation:
sklearn documentation: https://scikit-learn.org/0.15/modules/scaling_strategies.html#incremental-learning
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With