I need to calculate cohen's d to determine the effect size of an experiment. Is there any implementation in a sound library I could use? If not, what would be a good implementation?
For the independent samples T-test, Cohen's d is determined by calculating the mean difference between your two groups, and then dividing the result by the pooled standard deviation.
To calculate an effect size, called Cohen's d , for the one-sample t-test you need to divide the mean difference by the standard deviation of the difference, as shown below. Note that, here: sd(x-mu) = sd(x) . μ is the theoretical mean against which the mean of our sample is compared (default value is mu = 0).
When the standard deviations of both groups of observations are equal, Cohen's dav, and Cohen's drm are identical, and the effect size equals Cohen's ds for the same means and standard deviations in a between subject design.
The above implementation is correct in the special case that the two groups have equal size. A more general solution based on the formulas found at Wikipedia and in Robert Coe's article is the 2nd method shown below. Be aware that the denominator is the pooled standard deviation which is generally only appropriate if the population standard deviation is equal for both groups:
from numpy import std, mean, sqrt
#correct if the population S.D. is expected to be equal for the two groups.
def cohen_d(x,y):
nx = len(x)
ny = len(y)
dof = nx + ny - 2
return (mean(x) - mean(y)) / sqrt(((nx-1)*std(x, ddof=1) ** 2 + (ny-1)*std(y, ddof=1) ** 2) / dof)
#dummy data
x = [2,4,7,3,7,35,8,9]
y = [i*2 for i in x]
# extra element so that two group sizes are not equal.
x.append(10)
#correct only if nx=ny
d = (mean(x) - mean(y)) / sqrt((std(x, ddof=1) ** 2 + std(y, ddof=1) ** 2) / 2.0)
print ("d by the 1st method = " + str(d))
if (len(x) != len(y)):
print("The first method is incorrect because nx is not equal to ny.")
#correct for more general case including nx !=ny
print ("d by the more general 2nd method = " + str(cohen_d(x,y)))
Output will be:
d by the 1st method = -0.559662109472 The first method is incorrect because nx is not equal to ny. d by the more general 2nd method = -0.572015604666
Since Python3.4, you can use the statistics
module for calculating spread and average metrics. With that, Cohen's d can be calculated easily:
from statistics import mean, stdev
from math import sqrt
# test conditions
c0 = [2, 4, 7, 3, 7, 35, 8, 9]
c1 = [i * 2 for i in c0]
cohens_d = (mean(c0) - mean(c1)) / (sqrt((stdev(c0) ** 2 + stdev(c1) ** 2) / 2))
print(cohens_d)
Output:
-0.5567679522645598
So we observe a medium effect.
In Python 2.7, you can use numpy
with a couple of caveats, as I discovered while adapting Bengt's answer from Python 3.4.
from __future__ import division
ddof=1
into the std
function , i.e. numpy.std(c0, ddof=1)
. numpy's standard deviation default behaviour is to divide by n
, whereas with ddof=1
it will divide by n-1
.Code
from __future__ import division #Ensure division returns float
from numpy import mean, std # version >= 1.7.1 && <= 1.9.1
from math import sqrt
import sys
def cohen_d(x,y):
return (mean(x) - mean(y)) / sqrt((std(x, ddof=1) ** 2 + std(y, ddof=1) ** 2) / 2.0)
if __name__ == "__main__":
# test conditions
c0 = [2, 4, 7, 3, 7, 35, 8, 9]
c1 = [i * 2 for i in c0]
print(cohen_d(c0,c1))
Output will then be:
-0.556767952265
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With