I have two arrays (x1
and x2
) of equal length that have overlapping ranges of values.
I need to find a value q
such that l1-l2
is minimized, and
l1 = x1[np.where(x1 > q)].shape[0]
l2 = x2[np.where(x2 < q)].shape[0]
I need this to be reasonably high-performance since the arrays can be large. A solution using native numpy routines would be preferred.
There may be a smarter way to look for a value, but you can do an exhaustive search as follows:
>>> x1 = np.random.rand(10)
>>> x2 = np.random.rand(10)
>>> x1.sort()
>>> x2.sort()
>>> x1
array([ 0.12568451, 0.30256769, 0.33478133, 0.41973331, 0.46493576,
0.52173197, 0.72289189, 0.72834444, 0.78662283, 0.78796277])
>>> x2
array([ 0.05513774, 0.21567893, 0.29953634, 0.37426842, 0.40000622,
0.54602497, 0.7225469 , 0.80116148, 0.82542633, 0.86736597])
We can compute l1
if q
is one of the items in x1
as:
>>> l1_x1 = len(x1) - np.arange(len(x1)) - 1
>>> l1_x1
array([9, 8, 7, 6, 5, 4, 3, 2, 1, 0])
And l2
for the same q
as:
>>> l2_x1 = np.searchsorted(x1, x2)
>>> l2_x1
array([ 0, 1, 1, 3, 3, 6, 6, 10, 10, 10], dtype=int64)
You can similarly get values for l1
and l2
when q
is in x2
:
>>> l2_x2 = np.arange(len(x2))
>>> l2_x2
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> l1_x2 = len(x1) - np.searchsorted(x1, x2, side='right')
>>> l1_x2
array([10, 9, 9, 7, 7, 4, 4, 0, 0, 0], dtype=int64)
And you then simply check for the minimum of l1 - l2
:
>>> np.concatenate((l1_x1 - l2_x1, l1_x2 - l2_x2))
array([ 9, 7, 6, 3, 2, -2, -3, -8, -9, -10, 10, 8, 7,
4, 3, -1, -2, -7, -8, -9], dtype=int64)
>>> q_idx = np.argmin(np.abs(np.concatenate((l1_x1 - l2_x1, l1_x2 - l2_x2))))
>>> q = x1[q_idx] if q_idx < len(x1) else x2[q_idx - len(x1)]
>>> q
0.54602497466094291
>>> x1[x1 > q].shape[0]
4L
>>> x2[x2 < q].shape[0]
5L
I think I may have found a fairly simple way to do it.
x1 = (50 - 10) * np.random.random(10000) + 10
x2 = (75 - 25) * np.random.random(10000) + 25
x1.sort()
x2.sort()
x2 = x2[::-1] # reverse the array
# The overlap point should fall where the difference is smallest
diff = np.abs(x1 - x2)
# get the index of where the minimum occurs
loc = np.where(diff == np.min(diff))
q1 = x1[loc] # 38.79087351
q2 = x2[loc] # 38.79110941
M4rtini's solution produces q = 38.7867527
.
This is fundamentally an interval problem so you might want to do some reading on Interval trees, but you don't need to understand interval trees to solve this problem.
If you think of every (x1[i], x2[i])
as being an interval, you're looking for the value q
which splits the intervals into two groups as evenly as possible ignoring intervals that overlap q
. Lets take the easy case first:
from numpy import array
x1 = array([19, 32, 47, 13, 56, 1, 87, 48])
x2 = array([44, 38, 50, 39, 85, 26, 92, 64])
x1sort = np.sort(x1)
x2sort = np.sort(x2)[::-1]
diff = abs(x2sort - x1sort)
mindiff = diff.argmin()
print mindiff, x2sort[mindiff], x1sort[mindiff]
# 4 44 47
@xvtk's solution works well in this case and gives us a range of [44, 47]. Because no intervals overlap the range, all values of q
in the range are equivalent and yield an optimal result. Here is an example that is a little more tricky:
x1 = array([12, 65, 46, 81, 71, 77, 37])
x2 = array([ 20, 85, 59, 122, 101, 87, 58])
x1sort = np.sort(x1)
x2sort = np.sort(x2)[::-1]
diff = abs(x2sort - x1sort)
mindiff = diff.argmin()
print mindiff, x2sort[mindiff], x1sort[mindiff], x1sort[mindiff-1]
# 59 71 65
Here the solution gives us a range of [59, 71] but notice that not all values in the range are equivalent. Anything to the left of the green line will produce 3 and 4 intervals on the left and right respectively, while anything to the right of the green line will produce 3 intervals on both sides.
I'm pretty sure that the optimal solution is guaranteed to be in the range produced by @xvtk's solution. It's possible that one of the red lines is guaranteed to be an optimal solution, though I'm not sure on this point. Hope that helps.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With