Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python vectorization with a constant

I have a series X of length n(=300,000). Using a window length of w (=40), I need to implement:

mu(i)= X(i)-X(i-w)

s(i) = sum{k=i-w to i} [X(k)-X(k-1) - mu(i)]^2

I was wondering if there's a way to prevent loops here. The fact that mu(i) is constant in second equation is causing complications in vectorization. I did the following so far:

x1=x.shift(1)
xw=x.shift(w)
mu= x-xw 
dx=(x-x1-mu)**2 # wrong because mu wouldn't be constant for each i
s=pd.rolling_sum(dx,w)

The above code would work (and was working) in a loop setting but takes too long, so any help regarding vectorization or other speed improvement methods would be helpful. I posted this on crossvalidated with mathjax formatting but that doesn't seem to work here.

https://stats.stackexchange.com/questions/241050/python-vectorization-with-a-constant

Also just to clarify, I wasn't using a double loop, just a single one originally:

        for i in np.arange(w, len(X)):
            x=X.ix[i-w:i,0] # clip a series of size w
            x1=x.shift(1)   
            mu.ix[i]= x.ix[-1]-x.ix[0]   
            temp= (x-x1-mu.ix[i])**2   # returns a series of size w but now mu is constant
            s.ix[i]= temp.sum()
like image 319
dayum Avatar asked May 20 '26 23:05

dayum


1 Answers

Approach #1 : One vectorized approach would be using broadcasting -

N = X.shape[0]
a = np.arange(N)
k2D = a[:,None] - np.arange(w+1)[::-1]
mu1D = X - X[a-w]
out = ((X[k2D] - X[k2D-1] - mu1D[:,None])**2).sum(-1)

We can further optimize the last step to get squared summations with np.einsum -

subs = X[k2D] - X[k2D-1] - mu1D[:,None]
out = np.einsum('ij,ij->i',subs,subs)

Further improvement is possible with the use of NumPy strides to get X[k2D] and X[k2D-1].


Approach #2 : To save on memory when working very large arrays, we can use one loop instead of two loops used in the original code, like so -

N = X.shape[0]
s = np.zeros((N))
k_idx = np.arange(-w,1)
for i in range(N):
    mu = X[i]-X[i-w]
    s[i] = ((X[k_idx]-X[k_idx-1] - mu)**2).sum()
    k_idx += 1

Again, np.einsum could be used here to compute s[i], like so -

subs = X[k_idx]-X[k_idx-1] - mu
s[i] = np.einsum('i,i->',subs,subs)
like image 171
Divakar Avatar answered May 22 '26 14:05

Divakar



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!