I'm using the scikit-learn machine learning library (Python) for a machine learning project. One of the algorithms I'm using is the Gaussian Naive Bayes implementation. One of the attributes of the GaussianNB() function is the following:
class_prior_ : array, shape (n_classes,)
I want to alter the class prior manually since the data I use is very skewed and the recall of one of the classes is very important. By assigning a high prior probability to that class the recall should increase.
However, I can't figure out how to set the attribute correctly. I've read the below topics already but their answers don't work for me.
How can the prior probabilities manually set for the Naive Bayes clf in scikit-learn?
How do I know what prior's I'm giving to sci-kit learn? (Naive-bayes classifiers.)
This is my code:
gnb = GaussianNB() gnb.class_prior_ = [0.1, 0.9] gnb.fit(data.XTrain, yTrain) yPredicted = gnb.predict(data.XTest)
I figured this was the correct syntax and I could find out which class belongs to which place in the array by playing with the values but the results remain unchanged. Also no errors were given.
What is the correct way of setting the attributes of the GaussianNB algorithm from scikit-learn library?
Link to the scikit documentation of GaussianNB
class_prior_ is an attribute rather than parameters. Once you fit the GaussianNB(), you can get access to class_prior_ attribute. It is calculated by simply counting the number of different labels in your training sample.
The conditional probability can be calculated using the joint probability, although it would be intractable. Bayes Theorem provides a principled way for calculating the conditional probability. The simple form of the calculation for Bayes Theorem is as follows: P(A|B) = P(B|A) * P(A) / P(B)
The posterior probability is calculated by updating the prior probability using Bayes' theorem. In statistical terms, the posterior probability is the probability of event A occurring given that event B has occurred.
Step 1: Calculate the prior probability for given class labels. Step 2: Find Likelihood probability with each attribute for each class. Step 3: Put these value in Bayes Formula and calculate posterior probability. Step 4: See which class has a higher probability, given the input belongs to the higher probability class.
@Jianxun Li: there is in fact a way to set prior probabilities in GaussianNB. It's called 'priors' and its available as a parameter. See documentation: "Parameters: priors : array-like, shape (n_classes,) Prior probabilities of the classes. If specified the priors are not adjusted according to the data." So let me give you an example:
from sklearn.naive_bayes import GaussianNB # minimal dataset X = [[1, 0], [1, 0], [0, 1]] y = [0, 0, 1] # use empirical prior, learned from y mn = GaussianNB() print mn.fit(X,y).predict([1,1]) print mn.class_prior_ >>>[0] >>>[ 0.66666667 0.33333333]
But if you changed the prior probabilities, it will give a different answer which is what you are looking for I believe.
# use custom prior to make 1 more likely mn = GaussianNB(priors=[0.1, 0.9]) mn.fit(X,y).predict([1,1]) >>>>array([1])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With