I have the following Python code and output:
>>> import numpy as np
>>> s = [12.40265325, -1.3362417499999921, 6.8768662500000062, 25.673127166666703, 19.733372250000002, 21.649556250000003, 7.1676752500000021, -0.85349583333329804, 23.130314250000012, 20.074925250000007, -0.29701574999999281, 17.078694250000012, 3.3652611666666985, 19.491246250000003, -0.76856974999999039, -1.8838917499999965, -6.8547018333333085, 4.5195052500000088, 5.9882702500000136, -9.5889237499999922, 13.98170916666669, -2.3662137499999929, 12.111165250000013, -6.8334957499999902, -21.379336749999993, 8.4651301666666967, 2.5094612500000082, -0.21386274999998989, 5.1226162500000072, 14.283680166666699, -4.3340977499999909, -2.7831607499999933, 8.2339832500000085, -12.841856749999991, -6.4984398333333075, -6.2773697499999912, -13.638411749999996, -15.90088974999999, -8.2505068333333043, -19.616496749999996, -4.4346607499999919, -10.056376749999991, -13.581729833333299, -8.2284047499999957, -4.5957137499999945, -5.3758427499999968, -12.254779833333302, 11.207287250000007, -12.848971749999997, -14.449801749999992, -17.247984749999993, -17.475253833333305]
>>> np.mean(s)
1.3664283380001927e-14
>>> np.std(s)
12.137473069268983
>>> (s - np.mean(s)) / np.std(s)
array([ 1.02184806, -0.11009225, 0.56658138, 2.1151954 , ...
When I run this in R, the results do not match:
> options(digits=16)
> s = c(12.40265325, -1.3362417499999921, 6.8768662500000062, 25.673127166666703, 19.733372250000002, 21.649556250000003, 7.1676752500000021, -0.85349583333329804, 23.130314250000012, 20.074925250000007, -0.29701574999999281, 17.078694250000012, 3.3652611666666985, 19.491246250000003, -0.76856974999999039, -1.8838917499999965, -6.8547018333333085, 4.5195052500000088, 5.9882702500000136, -9.5889237499999922, 13.98170916666669, -2.3662137499999929, 12.111165250000013, -6.8334957499999902, -21.379336749999993, 8.4651301666666967, 2.5094612500000082, -0.21386274999998989, 5.1226162500000072, 14.283680166666699, -4.3340977499999909, -2.7831607499999933, 8.2339832500000085, -12.841856749999991, -6.4984398333333075, -6.2773697499999912, -13.638411749999996, -15.90088974999999, -8.2505068333333043, -19.616496749999996, -4.4346607499999919, -10.056376749999991, -13.581729833333299, -8.2284047499999957, -4.5957137499999945, -5.3758427499999968, -12.254779833333302, 11.207287250000007, -12.848971749999997, -14.449801749999992, -17.247984749999993, -17.475253833333305)
> mean(s)
[1] 1.243449787580175e-14
> sd(s)
[1] 12.25589024484334
> (s - mean(s)) / sd(s)
[1] 1.01197489551755737 -0.10902853430514588 2.09475824715945480 0.56110703609584245 ...
I know the differences are pretty minor, but this is a bit of an issue with my application. Also of note, the R results match Stata's results too.
Note: I'm using Python 2.7.2, NumpPy 1.6.1, R 2.15.2 GUI 1.53 Leopard build 64-bit (6335)
For the std
, which is clearly off by some substantial amount, in numpy
, std
returns sqrt(sum((x-x.mean())**2)) / (n-ddof)
where ddof=0
by default. I guess R
assumes ddof=1
, because:
In [7]: s.std()
Out[7]: 12.137473069268983
In [8]: s.std(ddof=1)
Out[8]: 12.255890244843339
and:
> sd(s)
[1] 12.25589
I can't explain the mean
, but since it's basically zero in each case, I would call it precision issues. numpy
would report them as being 'close enough' under default tolerances:
In [5]: np.isclose(s.mean(), 1.24345e-14)
Out[5]: True
The other answers discuss this issue better than I can.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With