Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to improve the performance of this tiny distance Python function

I'm running into a performance bottleneck when using a custom distance metric function for a clustering algorithm from sklearn.

The result as shown by Run Snake Run is this:

enter image description here

Clearly the problem is the dbscan_metric function. The function looks very simple and I don't quite know what the best approach to speeding it up would be:

def dbscan_metric(a,b):
  if a.shape[0] != NUM_FEATURES:
    return np.linalg.norm(a-b)
  else:
    return np.linalg.norm(np.multiply(FTR_WEIGHTS, (a-b)))

Any thoughts as to what is causing it to be this slow would be much appreciated.

like image 543
houbysoft Avatar asked Nov 11 '22 04:11

houbysoft


1 Answers

I am not familiar with what the function does - but is there a possibility of repeated calculations? If so, you could memoize the function:

cache = {}
def dbscan_metric(a,b):

  diff = a - b

  if a.shape[0] != NUM_FEATURES:
    to_calc = diff
  else:
    to_calc = np.multiply(FTR_WEIGHTS, diff)

  if not cache.get(to_calc): cache[to_calc] = np.linalg.norm(to_calc)

  return cache[to_calc]
like image 186
zallarak Avatar answered Nov 14 '22 23:11

zallarak