How to get the p-value between two groups after groupby in pandas?

vocabulary

test = 0 ==> test
test = 1 ==> control

problem setup

import numpy as np
import pandas as pd
import scipy.stats as ss

np.random.seed(100)
N = 15
df = pd.DataFrame({'country': np.random.choice(['A','B','C'],N),
                   'test': np.random.choice([0,1], N),
                   'conversion': np.random.choice([0,1], N),
                   'sex': np.random.choice(['M','F'], N)

                  })


ans = df.groupby(['country','test'])['conversion'].agg(['size','mean']).unstack('test')
ans.columns = ['test_size','control_size','test_mean','control_mean']
         test_size  control_size  test_mean  control_mean
country                                                  
A                3             3   0.666667      0.666667
B                1             1   1.000000      1.000000
C                4             3   0.750000      1.000000

Question

Now I want to add two more columns to get the p-value between test and control group. But in my groupby I can only operate on one series at a time and I am not sure how to use two series to get the p-value.

Done so far:

def get_ttest(x,y):
    return stats.ttest_ind(x, y, equal_var=False).pvalue

pseudo code:

df.groupby(['country','test'])['conversion'].agg(
['size','mean', some_function_to_get_pvalue])

How to get the p-values columns?

Required Answer

I need the get the values for the column pvalue

         test_size  control_size  test_mean  control_mean  pvalue
country                                                  
A                3             3   0.666667      0.666667   ?
B                1             1   1.000000      1.000000   ?
C                4             3   0.750000      1.000000   ?

563

asked Dec 26 '19 16:12

BhishanPoudel

1 Answers

You can do this:

import numpy as np
import pandas as pd
import scipy.stats as stats

def get_ttest(x,y,sided=1):
    return stats.ttest_ind(x, y, equal_var=False).pvalue/sided

np.random.seed(100)
N = 15
df = pd.DataFrame({'country': np.random.choice(['A','B','C'],N),
                   'test': np.random.choice([0,1], N),
                   'conversion': np.random.choice([0,1], N),
                   'sex': np.random.choice(['M','F'], N)

                  })


col_groupby = 'country'
col_test_control = 'test'
col_effect = 'conversion'

a,b = df[col_test_control].unique()

df_pval = df.groupby([col_groupby,col_test_control])\
            [col_effect].agg(['size','mean']).unstack(col_test_control)

df_pval.columns = [f'group{a}_size',f'group{b}_size',
                   f'group{a}_mean',f'group{b}_mean']

df_pval['pvalue'] = df.groupby(col_groupby).apply(lambda dfx: get_ttest(
    dfx.loc[dfx[col_test_control] == a, col_effect],
    dfx.loc[dfx[col_test_control] == b, col_effect]))


df_pval.pipe(print)

Result

         test_size  control_size  test_mean  control_mean    pvalue
country                                                            
A                3             3   0.666667      0.666667  1.000000
B                1             1   1.000000      1.000000       NaN
C                4             3   0.750000      1.000000  0.391002

Test the result

# test for country C
c0 = df.loc[(df.country=='C') & (df.test==0),'conversion']
c1 = df.loc[(df.country=='C') & (df.test==1),'conversion']

pval = stats.ttest_ind(c0, c1, equal_var=False).pvalue
print(pval) # 0.39100221895577053

168

answered Sep 22 '22 09:09

BhishanPoudel

Related questions
                            
                                How to leave only the largest blob in an image?
                            
                                Check if two strings contain the same set of words in Python
                            
                                Python async-generator not async
                            
                                Invert the y-axis of an image without flipping the image upside down
                            
                                Understanding CTC loss for speech recognition in Keras
                            
                                AWS Lambda-API gateway "message": "Internal server error" (502 Bad Gateway)
                            
                                PySimpleGUI file browser specific file type
                            
                                Where do you specify your API key when making a request with the Google API python library?
                            
                                How can I reduce the memory of a pandas DataFrame?
                            
                                How to perform the Search operation using Google S2 geometry
                            
                                Count strings in nested list
                            
                                How to remove margins from PDF? (Generated using WeasyPrint)
                            
                                How to implement sorting in Django Admin for calculated model properties without writing the logic twice?
                            
                                What are handlers in python in plain English
                            
                                How to get the relative path between two absolute paths in Python using pathlib?
                            
                                What is the type annotation for a Flask view?
                            
                                Create websocket connection from requests session in python
                            
                                Specifying *args for a Callable type hint
                            
                                what is best practice to control "too many local variable in a function" without suppress and manipulate pylint settings?
                            
                                Anaconda navigator only showing one python version

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to get the p-value between two groups after groupby in pandas?

Tags:

python

pandas

numpy

scipy

p-value

vocabulary

problem setup

Question

Required Answer

BhishanPoudel

People also ask

1 Answers

Result

Test the result

BhishanPoudel

Recent Activity

Donate For Us