How can I optimize the calculation over this function in numpy?

Tags:

I want to implement the following problem in numpy and here is my code.

I've tried the following numpy code for this problem with one for loop. I am wondering if there is any more efficient way of doing this calculation? I really appreciate that!

k, d = X.shape
m = Y.shape[0]

c1 = 2.0*sigma**2
c2 = 0.5*np.log(np.pi*c1)
c3 = np.log(1.0/k)

L_B = np.zeros((m,))
for i in xrange(m):
    if i % 100 == 0:
        print i
    L_B[i] = np.log(np.sum(np.exp(np.sum(-np.divide(
                np.power(X-Y[i,:],2), c1)-c2,1)+c3)))

print np.mean(L_B)

I've thought of np.expand_dims(X, 2).repeat(Y.shape[0], 2)-Y by creating a 3D tensor therefore the following calculation can be done by broadcasting, but that would waste a lot of memory when m is large.

I also believe that the np.einsum() utilizes nothing but the for loop so might not be that efficient, correct me if I am wrong.

Any thought?

574

asked Apr 22 '17 17:04

xxx222

1 Answers

Optimization Stage #1

My first level of optimizations using a direct translation of the loopy code to a broadcasting based one upon introducing a new axis and as such not so memory efficient one, as listed below -

p1 = (-((X[:,None] - Y)**2)/c1)-c2
p11 = p1.sum(2)
p2 = np.exp(p11+c3)
out = np.log(p2.sum(0)).mean()

Optimization Stage #2

Bringing in few optimizations keeping in mind that we intend to separate out the operations on the constants, I ended up with the following -

c10 = -c1
c20 = X.shape[1]*c2

subs = (X[:,None] - Y)**2
p00 = subs.sum(2)
p10 = p00/c10
p11 = p10-c20
p2 = np.exp(p11+c3)
out = np.log(p2.sum(0)).mean()

Optimization Stage #3

Going further with it and and seeing the places where the operations could be optimized, I ended up using Scipy's cdist to replace the heavy-weight work of the squaring and sum-reduction. This should be pretty memory efficient and gave us the final implementation, as shown below -

from scipy.spatial.distance import cdist

# Setup constants
c10 = -c1
c20 = X.shape[1]*c2
c30 = c20-c3
c40 = np.exp(c30)
c50 = np.log(c40)

# Get stagewise operations corresponding to loopy ones
p1 = cdist(X, Y, 'sqeuclidean')
p2 = np.exp(p1/c10).sum(0)
out = np.log(p2).mean() - c50

Runtime test

Approaches -

def loopy_app(X, Y, sigma):
    k, d = X.shape
    m = Y.shape[0]

    c1 = 2.0*sigma**2
    c2 = 0.5*np.log(np.pi*c1)
    c3 = np.log(1.0/k)

    L_B = np.zeros((m,))
    for i in xrange(m):
        L_B[i] = np.log(np.sum(np.exp(np.sum(-np.divide(
                    np.power(X-Y[i,:],2), c1)-c2,1)+c3)))

    return np.mean(L_B)

def vectorized_app(X, Y, sigma):
    # Setup constants
    k, d = D_A.shape
    c1 = 2.0*sigma**2
    c2 = 0.5*np.log(np.pi*c1)
    c3 = np.log(1.0/k)

    c10 = -c1
    c20 = X.shape[1]*c2
    c30 = c20-c3
    c40 = np.exp(c30)
    c50 = np.log(c40)

    # Get stagewise operations corresponding to loopy ones
    p1 = cdist(X, Y, 'sqeuclidean')
    p2 = np.exp(p1/c10).sum(0)
    out = np.log(p2).mean() - c50
    return out

Timings and verification -

In [294]: # Setup inputs with m(=D_B.shape[0]) being a large number
     ...: X = np.random.randint(0,9,(100,10))
     ...: Y = np.random.randint(0,9,(10000,10))
     ...: sigma = 2.34
     ...: 

In [295]: np.allclose(loopy_app(X, Y, sigma),vectorized_app(X, Y, sigma))
Out[295]: True

In [296]: %timeit loopy_app(X, Y, sigma)
1 loops, best of 3: 225 ms per loop

In [297]: %timeit vectorized_app(X, Y, sigma)
10 loops, best of 3: 23.6 ms per loop

In [298]: # Setup inputs with m(=Y.shape[0]) being a much large number
     ...: X = np.random.randint(0,9,(100,10))
     ...: Y = np.random.randint(0,9,(100000,10))
     ...: sigma = 2.34
     ...: 

In [299]: np.allclose(loopy_app(X, Y, sigma),vectorized_app(X, Y, sigma))
Out[299]: True

In [300]: %timeit loopy_app(X, Y, sigma)
1 loops, best of 3: 2.27 s per loop

In [301]: %timeit vectorized_app(X, Y, sigma)
1 loops, best of 3: 243 ms per loop

Around 10x speedup there!

198

answered Oct 22 '22 11:10

Divakar

Related questions
                            
                                Boolean masking on multiple axes with numpy
                            
                                Keras pretrain CNN with TimeDistributed
                            
                                How can I change the default window position of a matplotlib figure?
                            
                                How do I prevent the TypeError: unsupported operand type(s) for -: 'NoneType' and 'int' when using keras EarlyStopping?
                            
                                How to extract information from Tweepy ResultSet
                            
                                Python code to calculate angle between three points (lat long coordinates)
                            
                                How can I rotate a page in pyPDF2?
                            
                                python telebot got unexpected response
                            
                                How to shut down python server
                            
                                How to load external html into html inside Django template
                            
                                Why does Flask use port 5000 locally and 80 when deployed?
                            
                                Reverse radial axes of Matplotlib polar plot
                            
                                Can I get a celery task's arguments if all I have is the task ID?
                            
                                Django User password not getting hashed for custom users
                            
                                Why is python pandas dataframe rounding my values?
                            
                                Python pty.spawn stdin not echoed but redirected to master's stdout
                            
                                Python/SciPy: How to get cubic spline equations from CubicSpline
                            
                                Anaconda Python 3.6 -- pythonw and python supposed to be equivalent?
                            
                                Keras does not utilize 100% cpu
                            
                                Python matplotlib ValueError for logit scale axis label

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How can I optimize the calculation over this function in numpy?

Tags:

python

machine-learning

numpy

python-2.7

kernel-density

xxx222

People also ask

1 Answers

Divakar

Recent Activity

Donate For Us