I'm trying to cluster using DBSCAN (scikit learn implementation) and location data. My data is in np array format, but to use DBSCAN with Haversine formula I need to create a distance matrix. I'm getting the following error when I try to do this( a 'module' not callable error.) From what i've reading online this is an import error, but I'm pretty sure thats not the case for me. I've created my own haversine distance formula, but I'm sure the error is not with this.
This is my input data, an np array (ResultArray).
[[ 53.3252628 -6.2644198 ]
[ 53.3287395 -6.2646543 ]
[ 53.33321202 -6.24785807]
[ 53.3261015 -6.2598324 ]
[ 53.325291 -6.2644105 ]
[ 53.3281323 -6.2661467 ]
[ 53.3253074 -6.2644483 ]
[ 53.3388147 -6.2338417 ]
[ 53.3381102 -6.2343826 ]
[ 53.3253074 -6.2644483 ]
[ 53.3228188 -6.2625379 ]
[ 53.3253074 -6.2644483 ]]
And this is the line of code that is erroring.
distance_matrix = sp.spatial.distance.squareform(sp.spatial.distance.pdist
(ResultArray,(lambda u,v: haversine(u,v))))
This is the error message:
File "Location.py", line 48, in <module>
distance_matrix = sp.spatial.distance.squareform(sp.spatial.distance.pdist
(ResArray,(lambda u,v: haversine(u,v))))
File "/usr/lib/python2.7/dist-packages/scipy/spatial/distance.py", line 1118, in pdist
dm[k] = dfun(X[i], X[j])
File "Location.py", line 48, in <lambda>
distance_matrix = sp.spatial.distance.squareform(sp.spatial.distance.pdist
(ResArray,(lambda u,v: haversine(u,v))))
TypeError: 'module' object is not callable
I import scipy as sp. ( import scipy as sp )
With Scipy you can define a custom distance function as suggested by the documentation at this link and reported here for convenience:
Y = pdist(X, f)
Computes the distance between all pairs of vectors in X using the user supplied 2-arity function f. For example, Euclidean distance between the vectors could be computed as follows:
dm = pdist(X, lambda u, v: np.sqrt(((u-v)**2).sum()))
Here I report my version of the code inspired on the code from this link:
from numpy import sin,cos,arctan2,sqrt,pi # import from numpy
# earth's mean radius = 6,371km
EARTHRADIUS = 6371.0
def getDistanceByHaversine(loc1, loc2):
'''Haversine formula - give coordinates as a 2D numpy array of
(lat_denter link description hereecimal,lon_decimal) pairs'''
#
# "unpack" our numpy array, this extracts column wise arrays
lat1 = loc1[1]
lon1 = loc1[0]
lat2 = loc2[1]
lon2 = loc2[0]
#
# convert to radians ##### Completely identical
lon1 = lon1 * pi / 180.0
lon2 = lon2 * pi / 180.0
lat1 = lat1 * pi / 180.0
lat2 = lat2 * pi / 180.0
#
# haversine formula #### Same, but atan2 named arctan2 in numpy
dlon = lon2 - lon1
dlat = lat2 - lat1
a = (sin(dlat/2))**2 + cos(lat1) * cos(lat2) * (sin(dlon/2.0))**2
c = 2.0 * arctan2(sqrt(a), sqrt(1.0-a))
km = EARTHRADIUS * c
return km
And calling in the following way:
D = spatial.distance.pdist(A, lambda u, v: getDistanceByHaversine(u,v))
In my implementation the matrix A has as first column the longitude values and as second column the latitude values expressed in decimal degrees.
Please refer to @TommasoF answer. This answer is wrong: pdist
allows to choose a custom distance function. I will delete the answer once it is not anymore chosen as the correct answer.
Simply scipy
's pdist
does not allow to pass in a custom distance function. As you can read in the docs, you have some options, but haverside distance is not within the list of supported metrics.
(Matlab pdist
does support the option though, see here)
you need to do the calculation "manually", i.e. with loops, something like this will work:
from numpy import array,zeros
def haversine(lon1, lat1, lon2, lat2):
""" See the link below for a possible implementation """
pass
#example input (your's, truncated)
ResultArray = array([[ 53.3252628, -6.2644198 ],
[ 53.3287395 , -6.2646543 ],
[ 53.33321202 , -6.24785807],
[ 53.3253074 , -6.2644483 ]])
N = ResultArray.shape[0]
distance_matrix = zeros((N, N))
for i in xrange(N):
for j in xrange(N):
lati, loni = ResultArray[i]
latj, lonj = ResultArray[j]
distance_matrix[i, j] = haversine(loni, lati, lonj, latj)
distance_matrix[j, i] = distance_matrix[i, j]
print distance_matrix
[[ 0. 0.38666203 1.41010971 0.00530489]
[ 0.38666203 0. 1.22043364 0.38163748]
[ 1.41010971 1.22043364 0. 1.40848782]
[ 0.00530489 0.38163748 1.40848782 0. ]]
Just for reference, an implementation in Python of Haverside can be found here.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With