How to improve the performance of this tiny distance Python function

Question

I'm running into a performance bottleneck when using a custom distance metric function for a clustering algorithm from sklearn.

The result as shown by Run Snake Run is this:

enter image description here

Clearly the problem is the dbscan_metric function. The function looks very simple and I don't quite know what the best approach to speeding it up would be:

def dbscan_metric(a,b):
  if a.shape[0] != NUM_FEATURES:
    return np.linalg.norm(a-b)
  else:
    return np.linalg.norm(np.multiply(FTR_WEIGHTS, (a-b)))

Any thoughts as to what is causing it to be this slow would be much appreciated.

zallarak · Accepted Answer

I am not familiar with what the function does - but is there a possibility of repeated calculations? If so, you could memoize the function:

cache = {}
def dbscan_metric(a,b):

  diff = a - b

  if a.shape[0] != NUM_FEATURES:
    to_calc = diff
  else:
    to_calc = np.multiply(FTR_WEIGHTS, diff)

  if not cache.get(to_calc): cache[to_calc] = np.linalg.norm(to_calc)

  return cache[to_calc]

How to improve the performance of this tiny distance Python function

Tags:

python

optimization

numpy

scikit-learn

houbysoft

1 Answers

zallarak

Recent Activity

Donate For Us

How to improve the performance of this tiny distance Python function

Tags:

python

optimization

numpy

scikit-learn

houbysoft

1 Answers

zallarak

Related questions

Recent Activity

Donate For Us