scikit-learn library has following classifiers which look similar:
http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html#sklearn.linear_model.LogisticRegression
http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.SGDClassifier.html#sklearn.linear_model.SGDClassifier
Are they essentially same or different? If they are different, how different is the implementation between two? And how do you decide which one to use given the problem of logistic regression?
SGD allows minibatch (online/out-of-core) learning via the partial_fit method. For best results using the default learning rate schedule, the data should have zero mean and unit variance. This implementation works with data represented as dense or sparse arrays of floating point values for the features.
The solvers implemented in the class Logistic Regression are “liblinear”, “newton-cg”, “lbfgs”, “sag” and “saga”. In a nutshell, the following table summarizes the solvers characteristics: The “saga” solver is often the best choice. The “liblinear” solver is used by default for historical reasons.
SGDClassifier supports multi-class classification by combining multiple binary classifiers in a “one versus all” (OVA) scheme. For each of the classes, a binary classifier is learned that discriminates between that and all other classes.
SGD Classifier is a linear classifier (SVM, logistic regression, a.o.) optimized by the SGD. These are two different concepts. While SGD is a optimization method, Logistic Regression or linear Support Vector Machine is a machine learning algorithm/model.
Logistic Regression in Sklearn doesn't have a 'sgd' solver though. It implements a log regularized logistic regression : it minimizes the log-probability.
SGDClassifier is a generalized linear classifier that will use Stochastic Gradient Descent as a solver. As it is mentionned here http://scikit-learn.org/stable/modules/sgd.html : "Even though SGD has been around in the machine learning community for a long time, it has received a considerable amount of attention just recently in the context of large-scale learning." It is easy to implement and efficient. For example, this is one of the solvers that is used for Neural Networks.
With SGDClassifier you can use lots of different loss functions (a function to minimize or maximize to find the optimum solution) that allows you to "tune" your model and find the best sgd based linear model for your data. Indeed, some data structures or some problems will need different loss functions.
In your example, the SGD classifier will have the same loss function as the Logistic Regression but a different solver. Depending on your data, you can have different results. You may try to find the best one using cross validation or even try a grid search cross validation to find the best hyper-parameters.
Hope that answers your questions.
Basically, SGD is like an umbrella capable to facing different linear functions. SGD is an approximation algorithm like taking single single points and as the number of point increases it converses more to the optimal solution. Therefore, it is mostly used when the dataset is large. Logistic Regression uses Gradient descent by default so its slower (if compared on large dataset) To make SGD perform well for any particular linear function, lets say here logistic Regression we tune the parameters called hyperparameter tuning
All linear classifiers(SVM, logistic regression, a.o.) can use the sgd: Stochastic Gradient Descent
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With