I am trying to do autocorrelation using Julia and compare it to Python's result. How come they give different results?
Julia code
using StatsBase
t = range(0, stop=10, length=10)
test_data = sin.(exp.(t.^2))
acf = StatsBase.autocor(test_data)
gives
10-element Array{Float64,1}:
1.0
0.13254954979179642
-0.2030283419321465
0.00029587850872956104
-0.06629381497277881
0.031309038331589614
-0.16633393452504994
-0.08482388975165675
0.0006905628640697538
-0.1443650483145533
Python code
from statsmodels.tsa.stattools import acf
import numpy as np
t = np.linspace(0,10,10)
test_data = np.sin(np.exp(t**2))
acf_result = acf(test_data)
gives
array([ 1. , 0.14589844, -0.10412699, 0.07817509, -0.12916543,
-0.03469143, -0.129255 , -0.15982435, -0.02067688, -0.14633346])
When it comes to outright performance, Python is nowhere near to match Julia. The Python community has released multiple patches and updates to bridge the gap to a certain extent. Yet, Python has not been able to reach the performance level of Julia.
We can plot the autocorrelation function for a time series in Python by using the tsaplots.plot_acf () function from the statsmodels library: The x-axis displays the number of lags and the y-axis displays the autocorrelation at that number of lags.
Although Julia is attracting some attention and making a name for itself, Python is not falling back in the same race. Whichever language you might opt for, many factors have to be considered since each language has its strengths and drawbacks.
Julia codes can easily be made by converting C or Python codes. It is very difficult to make Python codes by converting C codes or the other way around. Julia arrays are 1-indexed, i.e. arrays start with 1-n not 0-n. It might cause a problem with programmers having habit of using other languages.
This is because your test_data
is different:
Python:
array([ 0.84147098, -0.29102733, 0.96323736, 0.75441021, -0.37291918,
0.85600145, 0.89676529, -0.34006519, -0.75811102, -0.99910501])
Julia:
[0.8414709848078965, -0.2910273263243299, 0.963237364649543, 0.7544102058854344,
-0.3729191776326039, 0.8560014512776061, 0.9841238290665676, 0.1665709194875013,
-0.7581110212957692, -0.9991050130774393]
This happens because you are taking sin
of enormous numbers. For example, with the last number in t
being 10, exp(10^2)
is ~2.7*10^43. At this scale, floating point inaccuracies are about 3*10^9. So if even the least significant bit is different for Python and Julia, the sin
value will be way off.
In fact, we can inspect the underlying binary values of the initial array t
. For example, they differ in the third last value:
Julia:
julia> reinterpret(Int, range(0, stop=10, length=10)[end-2])
4620443017702830535
Python:
>>> import struct
>>> s = struct.pack('>d', np.linspace(0,10,10)[-3])
>>> struct.unpack('>q', s)[0]
4620443017702830536
We can indeed see that they disagree by exactly one machine epsilon. And if we use Julia take sin
of the value obtained by Python:
julia> sin(exp(reinterpret(Float64, 4620443017702830536)^2))
-0.3400651855865199
We get the same value Python does.
Just to expand a bit on the answer (adding as an answer as it is too long for a comment). In Julia you have the following:
julia> t = collect(range(0, stop=10, length=10))
10-element Array{Float64,1}:
0.0
1.1111111111111112
2.2222222222222223
3.3333333333333335
4.444444444444445
5.555555555555555
6.666666666666667
7.777777777777778
8.88888888888889
10.0
julia> t .- [10*i / 9 for i in 0:9]
10-element Array{Float64,1}:
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
while in Python:
>>> t = np.linspace(0,10,10)
>>> t - [10*i/9 for i in range(10)]
array([0.0000000e+00, 0.0000000e+00, 0.0000000e+00, 0.0000000e+00,
0.0000000e+00, 0.0000000e+00, 0.0000000e+00, 8.8817842e-16,
0.0000000e+00, 0.0000000e+00])
and you see that the 8-th number in Python is an inaccurate approximation of 70/9
, while in Julia in this case you get the sequence of closest approximations of 10*i/9
using Float64
.
So it would seem that because the original sequences differ you the rest follows what @Jakob Nissen commented.
However the things are not that simple. As exp
functions in Julia and Python differ a bit in what they produce. See Python:
>>> from math import exp
>>> from mpmath import mp
>>> mp.dps = 1000
>>> float(mp.exp((20/3)**2) - exp((20/3)**2))
-1957.096392544307
while in Julia:
julia> setprecision(1000)
1000
julia> Float64(exp(big((20/3)^2)) - exp((20/3)^2))
2138.903607455693
julia> Float64(exp(big((20/3)^2)) - nextfloat(exp((20/3)^2)))
-1957.096392544307
(you can check that (20/3)^2
is the same Float64
both in Julia and Python).
So in this case with exp
Python is slightly more accurate than Julia. Therefore even fixing t
(which is easy by using a comprehension in Python instead of linspace
) will not make the ACF to be equal.
All in all the conclusion is what @Jakob Nissen commented for such large values the results will be strongly influenced by the numerical inaccuracies.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With