Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python vs Julia autocorrelation

Tags:

python

julia

I am trying to do autocorrelation using Julia and compare it to Python's result. How come they give different results?

Julia code

using StatsBase

t = range(0, stop=10, length=10)
test_data = sin.(exp.(t.^2))

acf = StatsBase.autocor(test_data)

gives

10-element Array{Float64,1}:
  1.0                   
  0.13254954979179642   
 -0.2030283419321465    
  0.00029587850872956104
 -0.06629381497277881   
  0.031309038331589614  
 -0.16633393452504994   
 -0.08482388975165675   
  0.0006905628640697538 
 -0.1443650483145533

Python code

from statsmodels.tsa.stattools import acf
import numpy as np

t = np.linspace(0,10,10)
test_data = np.sin(np.exp(t**2))

acf_result = acf(test_data)

gives

array([ 1.        ,  0.14589844, -0.10412699,  0.07817509, -0.12916543,
       -0.03469143, -0.129255  , -0.15982435, -0.02067688, -0.14633346])
like image 478
Ross Mariano Avatar asked Feb 24 '20 15:02

Ross Mariano


People also ask

Is Python performance comparable to Julia?

When it comes to outright performance, Python is nowhere near to match Julia. The Python community has released multiple patches and updates to bridge the gap to a certain extent. Yet, Python has not been able to reach the performance level of Julia.

How to plot the autocorrelation function for a time series in Python?

We can plot the autocorrelation function for a time series in Python by using the tsaplots.plot_acf () function from the statsmodels library: The x-axis displays the number of lags and the y-axis displays the autocorrelation at that number of lags.

Is Python falling back in the same race as Julia?

Although Julia is attracting some attention and making a name for itself, Python is not falling back in the same race. Whichever language you might opt for, many factors have to be considered since each language has its strengths and drawbacks.

How can I make Julia code in Python?

Julia codes can easily be made by converting C or Python codes. It is very difficult to make Python codes by converting C codes or the other way around. Julia arrays are 1-indexed, i.e. arrays start with 1-n not 0-n. It might cause a problem with programmers having habit of using other languages.


2 Answers

This is because your test_data is different:

Python:

array([ 0.84147098, -0.29102733,  0.96323736,  0.75441021, -0.37291918,
        0.85600145,  0.89676529, -0.34006519, -0.75811102, -0.99910501])

Julia:

[0.8414709848078965, -0.2910273263243299, 0.963237364649543, 0.7544102058854344,
 -0.3729191776326039, 0.8560014512776061, 0.9841238290665676, 0.1665709194875013,
 -0.7581110212957692, -0.9991050130774393]

This happens because you are taking sin of enormous numbers. For example, with the last number in t being 10, exp(10^2) is ~2.7*10^43. At this scale, floating point inaccuracies are about 3*10^9. So if even the least significant bit is different for Python and Julia, the sin value will be way off.

In fact, we can inspect the underlying binary values of the initial array t. For example, they differ in the third last value:

Julia:

julia> reinterpret(Int, range(0, stop=10, length=10)[end-2])
4620443017702830535

Python:

>>> import struct
>>> s = struct.pack('>d', np.linspace(0,10,10)[-3])
>>> struct.unpack('>q', s)[0]
4620443017702830536

We can indeed see that they disagree by exactly one machine epsilon. And if we use Julia take sin of the value obtained by Python:

julia> sin(exp(reinterpret(Float64, 4620443017702830536)^2))
-0.3400651855865199

We get the same value Python does.

like image 57
Jakob Nissen Avatar answered Oct 17 '22 07:10

Jakob Nissen


Just to expand a bit on the answer (adding as an answer as it is too long for a comment). In Julia you have the following:

julia> t = collect(range(0, stop=10, length=10))
10-element Array{Float64,1}:
  0.0               
  1.1111111111111112
  2.2222222222222223
  3.3333333333333335
  4.444444444444445 
  5.555555555555555 
  6.666666666666667 
  7.777777777777778 
  8.88888888888889  
 10.0               

julia> t .- [10*i / 9 for i in 0:9]
10-element Array{Float64,1}:
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0

while in Python:

>>> t = np.linspace(0,10,10)
>>> t - [10*i/9 for i in range(10)]
array([0.0000000e+00, 0.0000000e+00, 0.0000000e+00, 0.0000000e+00,
       0.0000000e+00, 0.0000000e+00, 0.0000000e+00, 8.8817842e-16,
       0.0000000e+00, 0.0000000e+00])

and you see that the 8-th number in Python is an inaccurate approximation of 70/9, while in Julia in this case you get the sequence of closest approximations of 10*i/9 using Float64.

So it would seem that because the original sequences differ you the rest follows what @Jakob Nissen commented.

However the things are not that simple. As exp functions in Julia and Python differ a bit in what they produce. See Python:

>>> from math import exp
>>> from mpmath import mp
>>> mp.dps = 1000
>>> float(mp.exp((20/3)**2) - exp((20/3)**2))
-1957.096392544307

while in Julia:

julia> setprecision(1000)
1000

julia> Float64(exp(big((20/3)^2)) - exp((20/3)^2))
2138.903607455693

julia> Float64(exp(big((20/3)^2)) - nextfloat(exp((20/3)^2)))
-1957.096392544307

(you can check that (20/3)^2 is the same Float64 both in Julia and Python).

So in this case with exp Python is slightly more accurate than Julia. Therefore even fixing t (which is easy by using a comprehension in Python instead of linspace) will not make the ACF to be equal.

All in all the conclusion is what @Jakob Nissen commented for such large values the results will be strongly influenced by the numerical inaccuracies.

like image 40
Bogumił Kamiński Avatar answered Oct 17 '22 08:10

Bogumił Kamiński