In classical statistics, people usually state what assumptions are assumed (i.e. normality and linearity of data, independence of data). But when I am reading machine learning textbooks and tutorials, the underlying assumptions are not always explicitly or completely stated. What are the major assumptions of the following ML classifiers for binary classification, and which ones are not so important to uphold and which one must be uphold strictly?
Thus, SVMs can be defined as linear classifiers under the following two assumptions: The margin should be as large as possible. The support vectors are the most useful data points because they are the ones most likely to be incorrectly classified.
It assumes that there is minimal or no multicollinearity among the independent variables. It usually requires a large sample size to predict properly. It assumes the observations to be independent of each other.
SVM uses kernel trick to solve non-linear problems whereas decision trees derive hyper-rectangles in input space to solve the problem. Decision trees are better for categorical data and it deals colinearity better than SVM.
In Machine Learning, tree-based techniques and Support Vector Machines (SVM) are popular tools to build prediction models. Decision trees and SVM can be intuitively understood as classifying different groups (labels), given their theories.
IID is the fundamental assumption of almost all statistical learning methods.
Logistic Regression is a special case of GLM(generalized linear model). So despite some technique requirements, the most strict restriction lies in the specific distribution of data distribution. Data MUST has a distribution in exponential family. You can dig deeper in https://en.wikipedia.org/wiki/Generalized_linear_model, and Stanford CS229 lecture note1 also has a excellent coverage of this topic.
SVM is quite tolerant of input data, especially the soft-margin version. I can not remember any specific assumption of data is taken(please correct).
Decision tree tells the same story as SVM.
Great question.
Logistic Regression also assumes the following:
That there isn't (or there is little) multicollinearity (high correlation) among the independent variables.
Even though LR doesn't require the dependent and independent variables to be linearly related, it does however require that the independent variables to be linearly related to the log odds. The log odds function is simply log(p/1-p)
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With