I have been trying for quite some time to implement my code to run on GPU, however with little success. I would really appreciate someone helping with the implementation.
Let me say a few words about the problem. I have a graph G with N nodes and a distribution mx on each node x. I would like to compute the distance between the distributions for every pair of nodes for all edges. For a given pair, (x,y), I use the code ot.sinkhorn(mx, my, dNxNy)
from the python POT package to compute the distance. Again, mx, my are vectors of size Nx and Ny on nodes x and y and dNxNy is a Nx x Ny distance matrix.
Now, I discovered that there is a GPU implementation of this code ot.gpu.sinkhorn(mx, my, dNxNy)
. However, this is not good enough because I mx, my and dNxNy would need to be uploaded to the GPU at every iteration, which is a massive overhead. So, the idea is to parallelise this for all edges on GPU.
The essence of the code is as follows. mx_all is all the distributions
for i,e in enumerate(G.edges):
W[i] = W_comp(mx_all,dist,e)
def W_comp(mx_all, dist, e):
i = e[0]
j = e[1]
Nx = np.array(mx_all[i][1]).flatten()
Ny = np.array(mx_all[j][1]).flatten()
mx = np.array(mx_all[i][0]).flatten()
my = np.array(mx_all[j][0]).flatten()
dNxNy = dist[Nx,:][:,Ny].copy(order='C')
W = ot.sinkhorn2(mx, my, dNxNy, 1)
Below is a minimal working example. Please ignore everything except the part between dashed ===
signs.
import ot
import numpy as np
import scipy as sc
def main():
import networkx as nx
#some example graph
G = nx.planted_partition_graph(4, 20, 0.6, 0.3, seed=2)
L = nx.normalized_laplacian_matrix(G)
#this just computes all distributions (IGNORE)
mx_all = []
for i in G.nodes:
mx_all.append(mx_comp(L,1,1,i))
#some random distance matrix (IGNORE)
dist = np.random.randint(5,size=(nx.number_of_nodes(G),nx.number_of_nodes(G)))
# =============================================================================
#this is what needs to be parallelised on GPU
W = np.zeros(nx.Graph.size(G))
for i,e in enumerate(G.edges):
print(i)
W[i] = W_comp(mx_all,dist,e)
return W
def W_comp(mx_all, dist, e):
i = e[0]
j = e[1]
Nx = np.array(mx_all[i][1]).flatten()
Ny = np.array(mx_all[j][1]).flatten()
mx = np.array(mx_all[i][0]).flatten()
my = np.array(mx_all[j][0]).flatten()
dNxNy = dist[Nx,:][:,Ny].copy(order='C')
return ot.sinkhorn2(mx, my, dNxNy,1)
# =============================================================================
#some other functions (IGNORE)
def delta(i, n):
p0 = np.zeros(n)
p0[i] = 1.
return p0
# all neighbourhood densities
def mx_comp(L, t, cutoff, i):
N = np.shape(L)[0]
mx_all = sc.sparse.linalg.expm_multiply(-t*L, delta(i, N))
Nx_all = np.argwhere(mx_all > (1-cutoff)*np.max(mx_all))
return mx_all, Nx_all
if __name__ == "__main__":
main()
Thank you!!
There are some packages, which allow you to run code on your GPU.
You can use one of the following packages:
When you want to use numba, the Python Anaconda distribution is recommended for doing this. Also, Anaconda Accelerate is needed. You can install it using conda install accelerate
. In this example, you can see how the usage of the GPU is achieved https://gist.githubusercontent.com/aweeraman/ae6e40f54a924f1f5832081be9521d92/raw/d6775c421aa4fa4c0d582e6c58873499d28b913a/gpu.py .
It's done by adding target='cuda'
to the @vectorize
decorator. Note the import from numba import vectorize
. The vectorize decorator takes the signature of the function that is to be accelerated as input.
Good luck!
Sources:
https://weeraman.com/put-that-gpu-to-good-use-with-python-e5a437168c01 https://www.researchgate.net/post/How_do_I_run_a_python_code_in_the_GPU
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With