I'm running into a performance bottleneck when using a custom distance metric function for a clustering algorithm from sklearn
.
The result as shown by Run Snake Run is this:
Clearly the problem is the dbscan_metric
function. The function looks very simple and I don't quite know what the best approach to speeding it up would be:
def dbscan_metric(a,b):
if a.shape[0] != NUM_FEATURES:
return np.linalg.norm(a-b)
else:
return np.linalg.norm(np.multiply(FTR_WEIGHTS, (a-b)))
Any thoughts as to what is causing it to be this slow would be much appreciated.
I am not familiar with what the function does - but is there a possibility of repeated calculations? If so, you could memoize the function:
cache = {}
def dbscan_metric(a,b):
diff = a - b
if a.shape[0] != NUM_FEATURES:
to_calc = diff
else:
to_calc = np.multiply(FTR_WEIGHTS, diff)
if not cache.get(to_calc): cache[to_calc] = np.linalg.norm(to_calc)
return cache[to_calc]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With