I am a newbie in Machine learning and Natural language processing.
I am always confused between what are those three terms?
From my understanding:
class: The various categories our model output. Given a name of person identify whether he/she is male or female?
Lets say I am using Naive Bayes classifier.
What would be my features and parameters?
Also, what are some of the aliases of the above words which are used interchangeably.
Thank you
Features: The characteristics that define your problem. These are also called attributes. Parameters: The variables your algorithm is trying to tune to build an accurate model.
Features are individual and independent variables that measure a property or characteristic of the task. Choosing informative, discriminative, and independent features is the first important decision when implementing any model.
What is a Parameter in a Machine Learning Model? A model parameter is a configuration variable that is internal to the model and whose value can be estimated from the given data. They are required by the model when making predictions. Their values define the skill of the model on your problem.
In summary, model parameters are estimated from data automatically and model hyperparameters are set manually and are used in processes to help estimate model parameters. Model hyperparameters are often referred to as parameters because they are the parts of the machine learning that must be set manually and tuned.
Let's use the example of classifying the gender of a person. Your understanding about class is correct! Given an input observation, our Naive Bayes Classifier should output a category. The class is that category.
Features: Features in a Naive Bayes Classifier, or any general ML Classification Algorithm, are the data points we choose to define our input. For the example of a person, we can't possibly input all data points about a person; instead, we pick a few features to define a person (say "Height", "Weight", and "Foot Size"). Specifically, in a Naive Bayes Classifier, the key assumption we make is that these features are independent (they don't affect each other): a person's height doesn't affect weight doesn't affect foot size. This assumption may or not be true, but for a Naive Bayes, we assume that it is true. In the particular case of your example where the input is just the name, features might be frequency of letters, number of vowels, length of name, or suffix/prefixes.
Parameters: Parameters in Naive Bayes are the estimates of the true distribution of whatever we're trying to classify. For example, we could say that roughly 50% of people are male, and the distribution of male height is a Gaussian distribution with mean 5' 7" and standard deviation 3". The parameters would be the 50% estimate, the 5' 7" mean estimate, and the 3" standard deviation estimate.
Aliases: Features are also referred to as attributes. I'm not aware of any common replacements for 'parameters'.
I hope that was helpful!
@txizzle explained the case of Naive Bayes well. In a more general sense:
Class: The output category of your data. You can call these categories as well. The labels on your data will point to one of the classes (if it's a classification problem, of course.)
Features: The characteristics that define your problem. These are also called attributes.
Parameters: The variables your algorithm is trying to tune to build an accurate model.
As an example, let us say you are trying to decide to whether admit a student to gard school or not based on various factors like his/her undergrad GPA, test scores, scores on recommendations, projects etc. In this case, the factors mentioned above are your features/attributes, whether the student is given an admit or not become your 2 classes, and the numbers which decide how these features combine together to get your output become your parameters. What the parameters actually represent depends on your algorithm. For a Neural Net, it's the weights on the synaptic links. Similarly, for a regression problem, the parameters are the coefficients of your features when they are combined.
take a simple linear classification problem-
y={0 if 5x-3>=0 else 1}
here y is class, x is feature, 5,3 are parameters.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With