Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas - Conditional Probability of a given specific b

Tags:

python

pandas

I have DataFrame with two columns of "a" and "b". How can I find the conditional probability of "a" given specific "b"?

df.groupby('a').groupby('b')

does not work. Lets assume I have 3 categories in column a, for each specific on I have 5 categories of b. What I need to do is to find total number of on class of b for each class of a. I tried apply command, but I think I do not know how to use it properly.

df.groupby('a').apply(lambda x: x[x['b']] == '...').count()
like image 632
Hamid K Avatar asked Nov 02 '15 00:11

Hamid K


People also ask

How do you find conditional probability of B given a?

If A and B are two events in a sample space S, then the conditional probability of A given B is defined as P(A|B)=P(A∩B)P(B), when P(B)>0.

How do you calculate conditional probability in pandas?

Conditional probabilities can be computed using the formula P(A|B)=P(A∩B)P(B).

How do you calculate unconditional probability from conditional probability?

The unconditional probability of an event A is equal to the sum of the product of conditional probabilities of event A with different mutually exclusive and exhaustive events and the probabilities of those events.


1 Answers

To find the total number of class b for each instance of class a you would do

df.groupby('a').b.value_counts()

For example, create a DataFrame as below:

df = pd.DataFrame({'A':['foo', 'bar', 'foo', 'bar','foo', 'bar', 'foo', 'foo'], 'B':['one', 'one', 'two', 'three','two', 'two', 'one', 'three'], 'C':np.random.randn(8), 'D':np.random.randn(8)})

     A      B         C         D
0  foo    one -1.565185 -0.465763
1  bar    one  2.499516 -0.941229
2  foo    two -0.091160  0.689009
3  bar  three  1.358780 -0.062026
4  foo    two -0.800881 -0.341930
5  bar    two -0.236498  0.198686
6  foo    one -0.590498  0.281307
7  foo  three -1.423079  0.424715

Then:

df.groupby('A')['B'].value_counts()

A
bar  one      1
     two      1
     three    1
foo  one      2
     two      2
     three    1

To convert this to a conditional probability, you need to divide by the total size of each group.

You can either do it with another groupby:

df.groupby('A')['B'].value_counts() / df.groupby('A')['B'].count()

A
bar  one      0.333333
     two      0.333333
     three    0.333333
foo  one      0.400000
     two      0.400000
     three    0.200000
dtype: float64

Or you can apply a lambda function onto the groups:

df.groupby('a').b.apply(lambda g: g.value_counts()/len(g))
like image 195
maxymoo Avatar answered Nov 16 '22 00:11

maxymoo