Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Calculation on my for loop and want to do it without for loop using some function

Tags:

python

numpy

dec = 0.1
data = np.array([100,200,300,400,500])

I have a for loop like this

y = np.zeros(len(data))
for i in range(len(data)):
    if i == 0:
        y[i] = (1.0 - dec) * data[i]
    else:
        y[i] = (1.0 - dec) * data[i] + (dec * y[i - 1])

Output y is:

array([ 90.   , 189.   , 288.9  , 388.89 , 488.889])

And now I want to do the above calculation without a loop, so if I break the code and do

data[0] = (1.0 - dec) * data[0]
data[1:] = (1.0 - dec) * data[1:] + (dec * data[0])

Output data is:

array([ 90, 189, 279, 369, 459])

When you compare y and data output first two values are correct because it is getting multiplied with data[0] which makes sense but later on it should continue as the loop does in loop code, so how can we achieve that? Is there a function that can handle this? This is mainly to optimize my code so that it runs faster for thousands of data.

The expected output is the same as the y output.

like image 553
Akilesh Avatar asked Sep 02 '21 13:09

Akilesh


2 Answers

We can do this with scipy.linalg.toeplitz to make a matrix of shifts of the data and then multiplying that by powers of dec and summing columns:

import numpy as np
from scipy.linalg import toeplitz

dec = 0.1
data = np.array([100,200,300,400,500])

decs = np.power(dec, np.arange(len(data)))

r = np.zeros_like(data)
r[0] = data[0]
toep = toeplitz(r, data)

output = (1 - dec) * np.sum(toep * decs.reshape(-1, 1), axis=0)

First decs is a vector of powers of dec:

print(decs) 
#[1.e+00 1.e-01 1.e-02 1.e-03 1.e-04]

Next, we use toeplitz to make a matrix of shifts of data:

print(toep)
#[[100 200 300 400 500]
# [  0 100 200 300 400]
# [  0   0 100 200 300]
# [  0   0   0 100 200]
# [  0   0   0   0 100]])

Finally we reshape decs into a column, multiply it by toep and sum along columns. This result needs to be scaled by 1 - dec.

This all works because we can rewrite our definition of data[i] as a sum instead of recursively:

y[i] = (1.0 - dec) * data[i] + (dec * y[i - 1])
y[i] = (1.0 - dec) * data[i] + (dec * ((1.0 - dec) * data[i - 1] + (dec * y[i - 2])))
...
y[i] = (1.0 - dec) * (data[i] + dec * data[i - 1] + dec ** 2 * data[i - 2] + ... dec ** i * data[0])
y[i] = (1.0 - dec) * sum(dec ** j * data[i - j] for j in range(i + 1))

This can be proven by induction.

From there it follows from rewriting those sums as columns of a matrix and translating that matrix to a calculation in numpy/scipy.

like image 85
Kyle Parsons Avatar answered Oct 11 '22 16:10

Kyle Parsons


I know that you said not to use python for loop

But also np.vectorize is not real vectorization(it will not move your code C) it is only convenience

Since you said 1000s of data so you should try numba, as it moves for loop to machine code

As of now I am not able to think of getting the same correct output only using numpy ufuncs(np.add, np.dot, etc) only, since ufuncs are known to vectorize(real simd vectorization) depending if compiler can do it, so maybe for now you can try numba

import numba as nb
import numpy as np

dec = 0.1
data = np.array([100,200,300,400,500])
@nb.jit((nb.int64[:],))
def f(data):
    y = np.zeros(data.shape[0])
    for i in range(y.shape[0]):
        if i == 0:
            y[i] = (1.0 - dec) * data[i]
        else:
            y[i] = (1.0 - dec) * data[i] + (dec * y[i - 1])
    return y
print(f(data))

It is also possible to parallelize(here), but I am not sure how to do it correctly in this case. Also infact I doubt if vectorization and also parallelism is possible to this problem without adding more complicated code

like image 2
eroot163pi Avatar answered Oct 11 '22 14:10

eroot163pi