In my problem, I have very large dataset which is out of my memory. I would like to train my model by using disk data like HDF5 or such. Does sklearn support this or is there any other alternative ?
Linear discriminant analysis, as you may be able to guess, is a linear classification algorithm and best used when the data has a linear relationship.
In the scikit-learn tutorial, it's short for classifier.: We call our estimator instance clf , as it is a classifier.
What you ask for is called out-of-core or streaming learning. It is only possible with a subset of the scikit-learn models that implement the partial_fit
method for incremental fitting.
There is an example in the documentation. There is no specific utility to fit models on data in HDF5 in particular but can can adapt this example to fetch the data from any external datasource (e.g. HDF5 data on the local disk or a database over the network, for instance using the pandas SQL adapter).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With