Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Understanding Bayes' Theorem

I'm working on an implementation of a Naive Bayes Classifier. Programming Collective Intelligence introduces this subject by describing Bayes Theorem as:

Pr(A | B) = Pr(B | A) x Pr(A)/Pr(B)

As well as a specific example relevant to document classification:

Pr(Category | Document) = Pr(Document | Category) x Pr(Category) / Pr(Document)

I was hoping someone could explain to me the notation used here, what do Pr(A | B) and Pr(A) mean? It looks like some sort of function but then what does the pipe ("|") mean, etc?

like image 333
benmcredmond Avatar asked Dec 29 '09 11:12

benmcredmond


4 Answers

  • Pr(A | B) = Probability of A happening given that B has already happened
  • Pr(A) = Probability of A happening

But the above is with respect to the calculation of conditional probability. What you want is a classifier, which uses this principle to decide whether something belongs to a category based on the previous probability.

See http://en.wikipedia.org/wiki/Naive_Bayes_classifier for a complete example

like image 191
Vinko Vrsalovic Avatar answered Nov 12 '22 15:11

Vinko Vrsalovic


I think they've got you covered on the basics.

Pr(A | B) = Pr(B | A) x Pr(A)/Pr(B)

reads: the probability of A given B is the same as the probability of B given A times the probability of A divided by the probability of B. It's usually used when you can measure the probability of B and you are trying to figure out if B is leading us to believe in A. Or, in other words, we really care about A, but we can measure B more directly, so let's start with what we can measure.

Let me give you one derivation that makes this easier for writing code. It comes from Judea Pearl. I struggled with this a little, but after I realized how Pearl helps us turn theory into code, the light turned on for me.

Prior Odds:

O(H) = P(H) / 1 - P(H)

Likelihood Ratio:

L(e|H) = P(e|H) / P(e|¬H)

Posterior Odds:

O(H|e) = L(e|H)O(H)

In English, we are saying that the odds of something you're interested in (H for hypothesis) are simply the number of times you find something to be true divided by the times you find it not to be true. So, say one house is robbed every day out of 10,000. That means that you have a 1/10,000 chance of being robbed, without any other evidence being considered.

The next one is measuring the evidence you're looking at. What is the probability of seeing the evidence you're seeing when your question is true divided by the probability of seeing the evidence you're seeing when your question is not true. Say you are hearing your burglar alarm go off. How often do you get that alarm when it's supposed to go off (someone opens a window when the alarm is on) versus when it's not supposed to go off (the wind set the alarm off). If you have a 95% chance of a burglar setting off the alarm and a 1% chance of something else setting off the alarm, then you have a likelihood of 95.0.

Your overall belief is just the likelihood * the prior odds. In this case it is:

((0.95/0.01) * ((10**-4)/(1 - (10**-4))))
# => 0.0095009500950095

I don't know if this makes it any more clear, but it tends to be easier to have some code that keeps track of prior odds, other code to look at likelihoods, and one more piece of code to combine this information.

like image 24
David Richards Avatar answered Nov 12 '22 15:11

David Richards


I have implemented it in Python. It's very easy to understand because all formulas for Bayes theorem are in separate functions:

#Bayes Theorem

def get_outcomes(sample_space, f_name='', e_name=''):
    outcomes = 0
    for e_k, e_v in sample_space.items():
        if f_name=='' or f_name==e_k:
            for se_k, se_v in e_v.items():
                if e_name!='' and se_k == e_name:
                    outcomes+=se_v
                elif e_name=='':
                    outcomes+=se_v
    return outcomes

def p(sample_space, f_name):
    return get_outcomes(sample_space, f_name) / get_outcomes(sample_space, '', '')

def p_inters(sample_space, f_name, e_name):
    return get_outcomes(sample_space, f_name, e_name) / get_outcomes(sample_space, '', '')

def p_conditional(sample_space, f_name, e_name):
    return p_inters(sample_space, f_name, e_name) / p(sample_space, f_name)

def bayes(sample_space, f, given_e):
    sum = 0;
    for e_k, e_v in sample_space.items():
        sum+=p(sample_space, e_k) * p_conditional(sample_space, e_k, given_e)
    return p(sample_space, f) * p_conditional(sample_space, f, given_e) / sum

sample_space = {'UK':{'Boy':10, 'Girl':20},
                'FR':{'Boy':10, 'Girl':10},
                'CA':{'Boy':10, 'Girl':30}}

print('Probability of being from FR:', p(sample_space, 'FR'))
print('Probability to be French Boy:', p_inters(sample_space, 'FR', 'Boy'))
print('Probability of being a Boy given a person is from FR:', p_conditional(sample_space, 'FR', 'Boy'))
print('Probability to be from France given person is Boy:', bayes(sample_space, 'FR', 'Boy'))

sample_space = {'Grow' :{'Up':160, 'Down':40},
                'Slows':{'Up':30, 'Down':70}}

print('Probability economy is growing when stock is Up:', bayes(sample_space, 'Grow', 'Up'))
like image 4
Alex Avatar answered Nov 12 '22 16:11

Alex


Pr(A | B): Conditional probability of A : i.e. probability of A, given that all we know is B

Pr(A) : Prior probability of A

like image 2
Upul Bandara Avatar answered Nov 12 '22 14:11

Upul Bandara