Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Bayesian inference

I have an instrument that will either pass or fail a series of three tests. The instrument must pass all three tests to be considered successful. How may I use Bayesian inference to look at the probability of passing each case based on evidence? (based on an instrument passing each past-test in turn).

Looking at just the first test - I know this from historical records of instrument tests. You can also see that each test has an acceptance boundary of -3% to +3%: enter image description here

My Assumptions:

  • Probabilities are dependent on each other- we are looking at the same instrument over all three tests

  • From this historical data I see that the probability of passing test A is P(A)=0.84, so failing is P(‘A)=0.16

  • Without knowing anything about an instrument a good assumption would be equie-probabilities of passing & failing the first test - The Hypotheses (H) is that the instrument passed P(H) = 0.5; this also gives us the failed probability P(‘H) = 0.5.

From my understanding I need to find P(H) given the Data (D), in Bayesian terms - I would then update P(H) given the results of test A -

**P(H|D) = P(H) P(D|H) / P(D)**   Where:

**P(D) = P(D|H)*P(H)  + P(D|’H) P(‘H)**

This is where I get lost, I think this is correct:

P(H)    = P('H) = 0.5  // prob of passing/failing test-A without any information  

P(D|H)  = 0.84          // prob of passing test-A from historical records

P('D|H) = 0.16         // prob of failing test-A from historical records

P(D) = P(D|H)*P(H) + P(D|’H) P(‘H) = 0.84*0.5 + 0.16*0.5
P(D) = 0.5

Giving a Bayesian value of: P(H|D) = P(H) P(D|H) / P(D) = 0.5*0.84 / 0.5, P(H|D) = 0.84 which is my new updated value for P(H) in test-B?


Out of interest all three tests look similar: enter image description here

like image 624
Harry Lime Avatar asked Jul 20 '15 12:07

Harry Lime


2 Answers

So there are a couple of things to take into account here. first You are right that the a priori probabilities to use are .5 and .5 respectively because it is how we mathematically encode not knowing what is going on, but you are showing the three graphs independently of each other and writing Bayes equations with only 1 dimension and that violates your dependence assumption. Also there is no need to use your marginalized P(D) in this setup to get to the conditional probabilities you are asking about.

What you are are really after is the conditional probability that the instrument will pass test C given how it did on test A and or test B

if you have only done test A then Bayes says:

P(C|A) = P(A|C)P(C)/P(A) or P(B|A) = P(A|B)P(B)/P(A)

Where A,B,and C can have values of pass or fail.

If you have done tests A and B then you want to know the probability of passing test C which Bayes says is:

P(C|A,B) = P(A,B|C)P(C)/P(A,B)

Which looks much more complicated, but the thing is you don’t really need to do Bayesian Inference to get the conditional probabilities you are asking for:

What is my probability of passing the next test given that I have already passed or failed this test?

You have all the information you need to compute that directly. One typically uses Bayesian inference when they don’t have that luxury.

To answer your question about how to calculate the probabilities that a future test will pass based upon whether or not it has already passed one or more tests think about what the values you want mean.

“Given that the instrument passed (or failed) test 1, what is the chance it will pass test 2 and test 3”

With your historical data you can answer this question directly.

Your question states that you care about probability of pass/fail so there are 2 possible outcomes for each test meaning that you really only have 8 states to consider for each instrument test set

(Number of TestA Outcomes)* (Number of TestB Outcomes)* (Number of TestC Outcomes) = 2*2*2 = 8

To calculate the probabilities you want, consider a 3D matrix which we will call ProbabilityHistogram with a cell for each outcome. Thus the matrix is 2*2*2. Where the matrix is indexed by whether or not a test has been passed historically. We are going to use this matrix to build a histogram of historical pass / fail data and then reference that histogram to build your probabilities of interest in the code below.

In our approach, the number of times that any instrument previously tested passed test A, failed test B, and Passed Test C would be found in ProbabilityHistogram [1,0,1], passing all three would be found in ProbabilityHistogram [1,1,1], failing all three ProbabilityHistogram [0,0,0], etc.

Here is how to calculate the values you want

Setup of Required Histogram

  • Start by defining a 2*2*2 matrix to hold histogram data
  • reading in your historical data
  • For every historical test you have in your data set, update the ProbabilityHistogram by using the UpdateProbHisto code below

Calculate the Probabilities of interest:

  • Calculate Conditional probabilities after one test using CProb_BCgA below
  • Calculate Conditional Probabilities after two tests using CProb_CgAB below

Code: (Sorry it is in C# because I have limited experience in Python, if you have questions just leave a comment and I'll explain further)

Set up the 3D matrix

//Define Probability Histogram
        double[, ,] ProbHisto = new double[2, 2, 2];// [A Test Outcome, B Test Outcome, C Test Outcome]

Update the Histogram

//Update Histogram based on historical data. 
        //pass in how the instrument did on each test as one dataset
        void updateProbHisto(bool APassed, bool BPassed, bool CPassed) {
            ProbHisto[Convert.ToInt16(APassed), Convert.ToInt16(BPassed), Convert.ToInt16(CPassed)]++;
        }

Calculate Probabilities after one test

//calculate the conditional probability that test B and test C will Pass given A's test reult
        double[] CProb_BCgA(bool ATestResult) {
            //Calculate probability of test B and test C success looking only at tests that passed or failed the same way  this instrument did given the A test result
        double[] rvalue = {0.0,0.0};//P(B|A), P(C|A)
            double BPassesGivenA = ProbHisto[Convert.ToInt16(ATestResult),1,0] + ProbHisto[Convert.ToInt16(ATestResult),1,1];
            double CPassesGivenA = ProbHisto[Convert.ToInt16(ATestResult),1,1] + ProbHisto[Convert.ToInt16(ATestResult),0,1];
            rvalue[0] = BPassesGivenA /(BPassesGivenA+ProbHisto[Convert.ToInt16(ATestResult),0,0] + ProbHisto[Convert.ToInt16(ATestResult),0,1]); // BPasses over BPasses + BFailures
            rvalue[1] = CPassesGivenA /(CPassesGivenA+ProbHisto[Convert.ToInt16(ATestResult),0,0] + ProbHisto[Convert.ToInt16(ATestResult),1,0]);// CPasses over CPasses + CFailures
            return rvalue;
        }

Calculate probabilities after two tests

//Calculate the conditional probability that test C will pass looking only at tests that passed or failed the same way this instrument did given the A and B test results
        double CProb_CgAB(bool ATestResult, bool BTestResult)
        {
            //Calculate probability of test C success given A and B test results
            double rvalue = 0.0;// P(C|A,B)
            double CPassesGivenAB = ProbHisto[Convert.ToInt16(ATestResult),Convert.ToInt16(BTestResult),1];
            rvalue= CPassesGivenAB /(CPassesGivenAB + ProbHisto[Convert.ToInt16(ATestResult),Convert.ToInt16(BTestResult),0]);// CPasses over CPasses + CFailures
            return rvalue;
        }

The conditional probability codes are set assuming that you do test A and then test B and then test C (BCgA = Probability of B Passing and C passing given result of test A), but it is straightforward to sub in the test result for B or C ins tread of the result for A just bare in mind which index you are putting the test pass/fail data in.

like image 192
Semicolons and Duct Tape Avatar answered Oct 13 '22 08:10

Semicolons and Duct Tape


As Semicolons and Duct Tape said, I too don't think that you need P(H) at all to answer the question. To answer what P(C|A) i.e. the probability of passing the test C is given that you pass the test, all you need is P(A & C) and P(A), which seems to be already available to you. Same is the case with P(B|A).

Here's a python snippet that shows this in action. Assume that the structure experiment is a list of tests where each test is a list of three numbers corresponding to the result (1 for pass, 0 for fail) of test A, test B and test C respectively.

def prob_yx(y, x, exp):
    "P(y|x). Data is the past experimental runs"

    # P (X & Y)
    c_xy = filter(lambda _: _[x] & _[y], exp)
    # P (Y)
    c_x = filter(lambda _: _[x], exp)

    return len(c_xy) / float(len(c_x))


experiment = [
    [0, 0, 1],
    [1, 1, 1],
    [1, 0, 0],
    [1, 1, 1],
    [1, 1, 0]
]

A = 0
B = 1
C = 2

# B given A
print prob_yx(B, A, experiment)
# C given A
print prob_yx(C, A, experiment)
# C given B
print prob_yx(C, B, experiment)

This gives

0.75
0.5
0.666666666667

Hope this is helpful..

like image 45
PDK Avatar answered Oct 13 '22 06:10

PDK