I have a cross-identification problem on data with 2 axes like
A = array([['x0', 'y0', 'data0', 'data0'],
['x0', 'y0', 'data0', 'data0'],
['x0', 'y0', 'data0', 'data0']])
B = array([['x1', 'y1', 'data1', 'data1'],
['x1', 'y1', 'data1', 'data1'],
['x1', 'y1', 'data1', 'data1']])
What I need is to find the rows of 2 lists that have the same position. The position needs to be discribed as their distance is close enough, which is:
distance = acos(cos(y0)*cos(y1)*cos(x0-x1)+sin(y0)*sin(y1))
if(distance < 0.001):
position = True
Currently, I use a code like below:
from math import *
def distance(x1,y1,x2,y2):
a = acos(cos(y1)*cos(y2)*cos(x1-x2)+sin(y1)*sin(y2))
if(a < 0.001):
return True
else:
return False
f = open('cross-identification')
for i in range(len(A[0])):
for j in range(len(B[0])):
if(distance(A[0][i],A[1][i],B[0][j],B[1][j])==True):
print(A[0][i],A[1][i],A[2][i],B[2][j],A[3][i],B[3][j],file=f)
else:continue
It's OK with a few rows, but the problem is that I have MASSIVE data and the speed is extremely slow. Are there any ways that can make it quicker?
BTW I have read this, close to what I want but I can't just change it. Maybe I can get some help from u?
In order not only to avoid the expensive Haversine formula but also to open up the option of using KDTrees, I'd recommend translating to Euclidean coordinates and distances.
def to_eucl_coords(lat, lon):
z = np.sin(lat)
x = np.sin(lon)*np.cos(lat)
y = np.cos(lon)*np.cos(lat)
return x, y, z
def to_eucl_dist(sphdist):
return 2*np.arcsin(sphdist/2)
KDTrees are easy to use, here is a skeleton which should get you started.
from scipy.spatial import cKDTree as KDTree
eucl_1 = np.c_[to_eucl_coords(lat1, lon1)]
eucl_2 = np.c_[to_eucl_coords(lat2, lon2)]
t1, t2 = KDTree(eucl_1), KDTree(eucl2)
neighbors = t1.query_ball_tree(t2, threshold)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With