I am interested in generating an array(or numpy Series) of length N that will exhibit specific autocorrelation at lag 1. Ideally, I want to specify the mean and variance, as well, and have the data drawn from (multi)normal distribution. But most importantly, I want to specify the autocorrelation. How do I do this with numpy, or scikit-learn?
Just to be explicit and precise, this is the autocorrelation I want to control:
numpy.corrcoef(x[0:len(x) - 1], x[1:])[0][1]
If you are interested only in the auto-correlation at lag one, you can generate an auto-regressive process of order one with the parameter equal to the desired auto-correlation; this property is mentioned on the Wikipedia page, but it's not hard to prove it.
Here is some sample code:
import numpy as np
def sample_signal(n_samples, corr, mu=0, sigma=1):
assert 0 < corr < 1, "Auto-correlation must be between 0 and 1"
# Find out the offset `c` and the std of the white noise `sigma_e`
# that produce a signal with the desired mean and variance.
# See https://en.wikipedia.org/wiki/Autoregressive_model
# under section "Example: An AR(1) process".
c = mu * (1 - corr)
sigma_e = np.sqrt((sigma ** 2) * (1 - corr ** 2))
# Sample the auto-regressive process.
signal = [c + np.random.normal(0, sigma_e)]
for _ in range(1, n_samples):
signal.append(c + corr * signal[-1] + np.random.normal(0, sigma_e))
return np.array(signal)
def compute_corr_lag_1(signal):
return np.corrcoef(signal[:-1], signal[1:])[0][1]
# Examples.
print(compute_corr_lag_1(sample_signal(5000, 0.5)))
print(np.mean(sample_signal(5000, 0.5, mu=2)))
print(np.std(sample_signal(5000, 0.5, sigma=3)))
The parameter corr lets you set the desired auto-correlation at lag one and the optional parameters, mu and sigma, let you control the mean and standard deviation of the generated signal.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With