I'm pretty strong with SQL, but I can't think of good solution to this "look-alike" data analysis problem:
Given a table with a set of integers, I need to match each integer with the integer in a second table that is most similar (smallest absolute difference). Normally I'd do a Cartesian join and order by the difference in numbers, but I need to only get one pairing for each row from each table so no value from either table can be used twice.
Any idea how to accomplish this?
EDIT: Example:
TABLE_A
34
57
108
219
380
5000
TABLE_B
4000
200
400
99
40
33
34
4600
The pairing would be one row from table_a and the closest row from table_b:
RESULT
34,34
57,40
108,99
219,200
380,400
5000,4600
So no row from either table appears twice.
EDIT: more clarification: I'm trying to solve this problem where given 1 row from table_a, we find the 1 row from table_b that's closest. That becomes a pair and is removed. Then take the next row from table_a and repeat. So we're trying to find the best match for each row and optimize that pairing, not trying to optimize total differences.
Assuming
where given 1 row from table_a, we find the 1 row from table_b that's closest
select
*
from
TABLE_A a
cross apply
(select top 1 Number from TABLE_B b order by abs(b.Number - a.Number)) b2
This also assume rows in b can be repeated: try it and see if it does what you want. However, this should fit your sample data so it would answer your question...
select v.*
from
(select a.value as avalue, b.value as bvalue,
(abs(a.value - b.value)) as difference
from
TABLE_A a,
TABLE_B b) v,
(select a.value as avalue, b.value as bvalue,
min((abs(a.value - b.value))) as difference
from
TABLE_A a,
TABLE_B b
group by a.value, b.value) m
where m.avalue = v.avalue and m.bvalue = v.value and m.difference = v.difference
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With