Looking to make the this calculation as quickly as possible. I have X as n x m numpy array. I want to define Y to be the following:
Y_11 = 1 / (exp(X_11-X_11) + exp(X_11-X_12) + ... exp(X_11 - X_1N) ).
or for Y_00
1/np.sum(np.exp(X[0,0]-X[0,:]))
So basically, Y is also n x m where the i,j element is the 1 / sum_j' exp(X_ij - X_ij')
Any tips would be great! Thanks.
Sample code as requested:
np.random.seed(111)
J,K = 110,120
X = np.random.rand(J,K)
Y = np.zeros((J,K))
for j in range(J):
for k in range(K):
Y[j,k] = 1/np.sum(np.exp(X[j,k]-X[j,:]))
# note each row will sum to 1 under this operation
np.sum(Y,axis=1)
Here are few fully vectorized approaches -
def vectorized_app1(X):
return 1/np.exp(X[:,None] - X[...,None]).sum(1)
def vectorized_app2(X):
exp_vals = np.exp(X)
return 1/(exp_vals[:,None]/exp_vals[...,None]).sum(1)
def vectorized_app3(X):
exp_vals = np.exp(X)
return 1/np.einsum('ij,ik->ij',exp_vals,1/exp_vals)
Tricks used and lessons learnt
Extend dimensions with None/np.newaxis
to bring in broadcasting and perform everything in a vectorized manner.
np.exp
is an expensive operation. So, using it on broadcasted huge array would be costly. So, the trick used is : exp(A-B) = exp(A)/exp(B)
. Thus, we perform np.exp(X)
upfront and then perform broadcasted divisions.
Those sum-reductions could be implemented with np.einsum
. This brings in memory-efficiency as we won't have to create huge broadcasted arrays and this effects in massive performance boost.
Runtime test -
In [111]: # Setup input, X
...: np.random.seed(111)
...: J,K = 110,120
...: X = np.random.rand(J,K)
...:
In [112]: %timeit original_approach(X)
1 loop, best of 3: 322 ms per loop
In [113]: %timeit vectorized_app1(X)
10 loops, best of 3: 124 ms per loop
In [114]: %timeit vectorized_app2(X)
100 loops, best of 3: 14.6 ms per loop
In [115]: %timeit vectorized_app3(X)
100 loops, best of 3: 3.01 ms per loop
Looks like einsum
showed its magic again with 100x+
speedup!
Here's a first step at reducing the double loop:
def foo2(X):
Y = np.zeros_like(X)
for k in range(X.shape[1]):
Y[:,k]=1/np.exp(X[:,[k]]-X[:,:]).sum(axis=1)
return Y
I suspect I can also remove the k
loop, but I have to spend more time figuring out just what it is doing. That X[:,[k]]-X[:,:]
isn't obvious (in the big picture).
Another step:
Z = np.stack([X[:,[k]]-X for k in range(X.shape[1])],2)
Y = 1/np.exp(Z).sum(axis=1)
which can be further refined (with too much trial and error) with
Z = X[:,None,:]-X[:,:,None]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With