What is strange is that it seems to be exactly the same code for the fit and for the partial_fit. You can see the code at the following link : https://github.com/scikit-learn/scikit-learn/blob/c957249/sklearn/decomposition/online_lda.py#L478

Not exactly the same code; <code>partial_fit</code> uses <code>total_samples</code>: " total_samples : int, optional (default=1e6) Total number of documents. Only used in the partial_fit method." https://github.com/scikit-learn/scikit-learn/blob/c957249/sklearn/decomposition/online_lda.py#L184 (partial fit) https://github.com/scikit-learn/scikit-learn/blob/c957249/sklearn/decomposition/online_lda.py#L472 (fit) https://github.com/scikit-learn/scikit-learn/blob/c957249/sklearn/decomposition/online_lda.py#L510 Just in case it is of your interest: <code>partial_fit</code> is a good candidate to be used whenever your dataset is really really big. So, instead of running into possible memory problems you perform your fitting in smaller batches, which is called incremental learning. So, in your case you should take into account that <code>total_samples</code> default's value is <code>1000000.0</code>. Therefore, if you don't change this number and your real number of samples is bigger then you'll get different results from the <code>fit</code> method and <code>fit_partial</code>. Or maybe it could be the case that you are using mini-batches in the <code>fit_partial</code> and not covering all the samples that you provide to the <code>fit</code> method. And even if you do this right, you could also get different results, as stated in the documentation: <ul> <li>"the incremental learner itself may be unable to cope with new/unseen targets classes. In this case you have to pass all the possible classes to the first partial_fit call using the classes= parameter."</li> <li>"[...] choosing a proper algorithm is that all of them don’t put the same importance on each example over time [...]"</li> </ul> sklearn documentation: https://scikit-learn.org/0.15/modules/scaling_strategies.html#incremental-learning

Why does the fit and the partial_fit of the sklearn LatentDirichletAllocation return different results ?

1 Answers

Not exactly the same code; partial_fit uses total_samples:

" total_samples : int, optional (default=1e6) Total number of documents. Only used in the partial_fit method."

https://github.com/scikit-learn/scikit-learn/blob/c957249/sklearn/decomposition/online_lda.py#L184

(partial fit) https://github.com/scikit-learn/scikit-learn/blob/c957249/sklearn/decomposition/online_lda.py#L472

(fit) https://github.com/scikit-learn/scikit-learn/blob/c957249/sklearn/decomposition/online_lda.py#L510

Just in case it is of your interest: partial_fit is a good candidate to be used whenever your dataset is really really big. So, instead of running into possible memory problems you perform your fitting in smaller batches, which is called incremental learning.

So, in your case you should take into account that total_samples default's value is 1000000.0. Therefore, if you don't change this number and your real number of samples is bigger then you'll get different results from the fit method and fit_partial. Or maybe it could be the case that you are using mini-batches in the fit_partial and not covering all the samples that you provide to the fit method. And even if you do this right, you could also get different results, as stated in the documentation:

"the incremental learner itself may be unable to cope with new/unseen targets classes. In this case you have to pass all the possible classes to the first partial_fit call using the classes= parameter."
"[...] choosing a proper algorithm is that all of them don’t put the same importance on each example over time [...]"

sklearn documentation: https://scikit-learn.org/0.15/modules/scaling_strategies.html#incremental-learning

173

answered Sep 21 '22 13:09

Guiem Bosch

Related questions
                            
                                argparse optional value for argument
                            
                                howto get fit parameters from seaborn distplot fit=?
                            
                                How to understand expression lists in Python
                            
                                what is the functions type in python3 [duplicate]
                            
                                how to run a basic mpi4py code
                            
                                "ValueError: _type_ 'v' not supported" error after installing PyReadline
                            
                                Charts from Excel to PowerPoint with Python
                            
                                "ImportError: cannot import name StanfordNERTagger" in NLTK
                            
                                How to access the next and the previous elements in a Django template forloop?
                            
                                Python Performance: remove item from list [duplicate]
                            
                                Raw unicode literal that is valid in Python 2 and Python 3?
                            
                                openCV equivalent of a PIL resize ANTIALIAS?
                            
                                Checking a List for a Sequence
                            
                                Descriptions of Boto3 ClientMethods
                            
                                Modifying rules for a given EC2 security group with Boto3
                            
                                SQLAlchemy: filter many-to-one relationship where the one object has a list containing a specific value
                            
                                Unexpected behaviour when indexing a 2D np.array with two boolean arrays
                            
                                SQLAlchemy eager loading multiple relationships
                            
                                Python import error: 'module' object has no attribute 'x'
                            
                                Reverse sort of Numpy array with NaN values

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why does the fit and the partial_fit of the sklearn LatentDirichletAllocation return different results ?

Tags:

python

scikit-learn

augustin-barillec

People also ask

1 Answers

Guiem Bosch

Recent Activity

Donate For Us