Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to calculate cohen's d in Python?

I need to calculate cohen's d to determine the effect size of an experiment. Is there any implementation in a sound library I could use? If not, what would be a good implementation?

like image 923
Bengt Avatar asked Feb 03 '14 16:02

Bengt


People also ask

How do you calculate Cohen's d value?

For the independent samples T-test, Cohen's d is determined by calculating the mean difference between your two groups, and then dividing the result by the pooled standard deviation.

How do you calculate Cohen's d for a paired samples t-test?

To calculate an effect size, called Cohen's d , for the one-sample t-test you need to divide the mean difference by the standard deviation of the difference, as shown below. Note that, here: sd(x-mu) = sd(x) . μ is the theoretical mean against which the mean of our sample is compared (default value is mu = 0).

Is Cohen's d the same as effect size?

When the standard deviations of both groups of observations are equal, Cohen's dav, and Cohen's drm are identical, and the effect size equals Cohen's ds for the same means and standard deviations in a between subject design.


3 Answers

The above implementation is correct in the special case that the two groups have equal size. A more general solution based on the formulas found at Wikipedia and in Robert Coe's article is the 2nd method shown below. Be aware that the denominator is the pooled standard deviation which is generally only appropriate if the population standard deviation is equal for both groups:

from numpy import std, mean, sqrt

#correct if the population S.D. is expected to be equal for the two groups.
def cohen_d(x,y):
    nx = len(x)
    ny = len(y)
    dof = nx + ny - 2
    return (mean(x) - mean(y)) / sqrt(((nx-1)*std(x, ddof=1) ** 2 + (ny-1)*std(y, ddof=1) ** 2) / dof)

#dummy data
x = [2,4,7,3,7,35,8,9]
y = [i*2 for i in x]
# extra element so that two group sizes are not equal.
x.append(10)

#correct only if nx=ny
d = (mean(x) - mean(y)) / sqrt((std(x, ddof=1) ** 2 + std(y, ddof=1) ** 2) / 2.0)
print ("d by the 1st method = " + str(d))
if (len(x) != len(y)):
    print("The first method is incorrect because nx is not equal to ny.")

#correct for more general case including nx !=ny
print ("d by the more general 2nd method = " + str(cohen_d(x,y)))

Output will be:

d by the 1st method = -0.559662109472 The first method is incorrect because nx is not equal to ny. d by the more general 2nd method = -0.572015604666

like image 64
skynaut Avatar answered Oct 10 '22 12:10

skynaut


Since Python3.4, you can use the statistics module for calculating spread and average metrics. With that, Cohen's d can be calculated easily:

from statistics import mean, stdev
from math import sqrt

# test conditions
c0 = [2, 4, 7, 3, 7, 35, 8, 9]
c1 = [i * 2 for i in c0]

cohens_d = (mean(c0) - mean(c1)) / (sqrt((stdev(c0) ** 2 + stdev(c1) ** 2) / 2))

print(cohens_d)

Output:

-0.5567679522645598

So we observe a medium effect.

like image 25
Bengt Avatar answered Oct 10 '22 13:10

Bengt


In Python 2.7, you can use numpy with a couple of caveats, as I discovered while adapting Bengt's answer from Python 3.4.

  1. Ensure division always returns float with: from __future__ import division
  2. Specify the division argument on the variance with ddof=1 into the std function , i.e. numpy.std(c0, ddof=1). numpy's standard deviation default behaviour is to divide by n, whereas with ddof=1 it will divide by n-1.

Code

from __future__ import division #Ensure division returns float
from numpy import mean, std # version >= 1.7.1 && <= 1.9.1
from math import sqrt
import sys


def cohen_d(x,y):
        return (mean(x) - mean(y)) / sqrt((std(x, ddof=1) ** 2 + std(y, ddof=1) ** 2) / 2.0)

if __name__ == "__main__":                
        # test conditions
        c0 = [2, 4, 7, 3, 7, 35, 8, 9]
        c1 = [i * 2 for i in c0]
        print(cohen_d(c0,c1))

Output will then be:

-0.556767952265
like image 26
pds Avatar answered Oct 10 '22 13:10

pds