Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Compare DB row values efficiently

I want to loop through a database of documents and calculate a pairwise comparison score.

A simplistic, naive method would nest a loop within another loop. This would result in the program comparing documents twice and also comparing each document to itself.

Is there a name for the algorithm for doing this task efficiently? Is there a name for this approach?

Thanks.

like image 368
seanieb Avatar asked Feb 11 '26 17:02

seanieb


2 Answers

Assume all items have a number ItemNumber

Simple solution -- always have the 2nd element's ItemNumber greater than the first item.

eg

for (firstitem = 1 to maxitemnumber)
  for (seconditem = firstitemnumber+1 to maxitemnumber)
    compare(firstitem, seconditem)

visual note: if you think of the compare as a matrix (item number of one on one axis item of the other on the other axis) this looks at one of the triangles.

........
x.......
xx......
xxx.....
xxxx....
xxxxx...
xxxxxx..
xxxxxxx.
like image 141
Hogan Avatar answered Feb 13 '26 09:02

Hogan


I don't think its complicated enough to qualify for a name.

You can avoid duplicate pairs just by forcing a comparison on any value which might be different between different rows - the primary key is an obvious choice, e.g.

Unique pairings:

SELECT a.item as a_item, b.item as b_item
FROM table AS a, table AS b
WHERE a.id<b.id

Potentially there are a lot of ways in which the the comparison operation can be used to generate data summmaries and therefore identify potentially similar items - for single words the soundex is an obvious choice - however you don't say what your comparison metric is.

C.

like image 39
symcbean Avatar answered Feb 13 '26 09:02

symcbean



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!