Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does order of function calls impact runtime

I'm using pyTorch to run calculations on my GPU (RTX 3000, CUDA 11.1). One step involves calculating the distance between one point and an array of points. For kicks I tested 2 functions to determine which is faster as follows:

import datetime as dt
import functools
import timeit
import torch
import numpy as np

device = torch.device("cuda:0")

# define functions for calculating distance
def dist_geom(a, b):
    dist = (a - b)**2
    dist = dist.sum(axis=1)**0.5

    return dist

def dist_linalg(a, b):
    dist = torch.linalg.norm(a - b, axis=1)

    return dist

   
# create dummy data
a = np.random.randint(0, 100000, (100000, 10, 10)).astype(np.float64)
b = np.random.randint(0, 100000, (1, 10)).astype(np.float64)

# send data to GPU
a = torch.from_numpy(a).to(device)
b = torch.from_numpy(b).to(device)


# test runtime of each
iterations = 1000

t = timeit.Timer(functools.partial(dist_linalg, a, b))
linalg_delta = t.timeit(number=iterations) / iterations
print("Linear algebra time: ", linalg_delta, " seconds per iteration")

t = timeit.Timer(functools.partial(dist_geom, a, b))
geom_delta = t.timeit(number=iterations) / iterations
print("Geometry time: ", geom_delta, " seconds per iteration")


print("linear algebra:geometry ratio: ", linalg_delta / geom_delta)

This gives the following output:

Linear algebra time:  0.000743145  seconds per iteration
Geometry time:  0.001446731  seconds per iteration
linear algebra:geometry ratio:  0.5136718574496572

So the linear algebra function is ~2x faster. But if I call the geometry function first:

t = timeit.Timer(functools.partial(dist_geom, a, b))
geom_delta = t.timeit(number=iterations) / iterations
print("Geometry time: ", geom_delta, " seconds per iteration")

t = timeit.Timer(functools.partial(dist_linalg, a, b))
linalg_delta = t.timeit(number=iterations) / iterations
print("Linear algebra time: ", linalg_delta, " seconds per iteration")      

print("linear algebra:geometry ratio: ", linalg_delta / geom_delta)

I get this output:

Geometry time:  0.001213497  seconds per iteration
Linear algebra time:  0.001136769  seconds per iteration
linear algebra:geometry ratio:  0.9367711663069623

The dist_geom time is nearly identical to the initial run, but the dist_linalg time is now 1.46x longer!

I've tested this multiple ways and the result is always the same: the call order seems to matter...a lot. I think I'm missing a fundamental point here, so any help in understanding what is going on will be appreciated (and I suspect it will be so simple I'll feel foolish).

I created two sets of tensors. The following yields the same runtime regardless of order.

# create 2 tensors for geometry test
a1 = np.random.randint(0, 100000, (100000, 10, 10)).astype(np.float64)
b1 = np.random.randint(0, 100000, (1, 10)).astype(np.float64)

a1 = torch.from_numpy(a1).to(device)
b1 = torch.from_numpy(b1).to(device)

t = timeit.Timer(functools.partial(dist_geom, a, b))
geom_delta = t.timeit(number=iterations) / iterations
print("Geometry time: ", geom_delta, " seconds per iteration")

# create 2 different tensors for the linalg function
a2 = np.random.randint(0, 100000, (100000, 10, 10)).astype(np.float64)
b2 = np.random.randint(0, 100000, (1, 10)).astype(np.float64)

a2 = torch.from_numpy(a2).to(device)
b2 = torch.from_numpy(b2).to(device)

t = timeit.Timer(functools.partial(dist_linalg, a, b))
linalg_delta = t.timeit(number=iterations) / iterations
print("Linear algebra time: ", linalg_delta, " seconds per iteration")      

print("linear algebra:geometry ratio: ", linalg_delta / geom_delta)


Geometry time:  0.0012010019999999998  seconds per iteration
Linear algebra time:  0.0007349769999999999  seconds per iteration
linear algebra:geometry ratio:  0.6119698385181707

That said, if I define both a1/b1 and a2/b2 before the function calls I see the difference in times again. Initially I thought this was caused memory load times, but that does not really fit, right?

like image 526
tnknepp Avatar asked Oct 26 '22 11:10

tnknepp


1 Answers

you just can add

torch.cuda.empty_cache()

All code:

import datetime as dt
import functools
import timeit
import torch
import numpy as np

device = torch.device("cuda:0")

# define functions for calculating distance
def dist_geom(a, b):
    dist = (a - b)**2
    dist = dist.sum(axis=1)**0.5

    return dist

def dist_linalg(a, b):
    dist = torch.linalg.norm(a - b, axis=1)

    return dist

   
# create dummy data
a = np.random.randint(0, 100000, (100000, 10, 10)).astype(np.float64)
b = np.random.randint(0, 100000, (1, 10)).astype(np.float64)

# send data to GPU
a = torch.from_numpy(a).to(device)
b = torch.from_numpy(b).to(device)


# test runtime of each
iterations = 1000

t = timeit.Timer(functools.partial(dist_linalg, a, b))
linalg_delta = t.timeit(number=iterations) / iterations
print("Linear algebra time: ", linalg_delta, " seconds per iteration")

torch.cuda.empty_cache()

t = timeit.Timer(functools.partial(dist_geom, a, b))
geom_delta = t.timeit(number=iterations) / iterations
print("Geometry time: ", geom_delta, " seconds per iteration")


print("linear algebra:geometry ratio: ", linalg_delta / geom_delta)
like image 181
dimabendera Avatar answered Oct 30 '22 00:10

dimabendera