Compare DB row values efficiently

Question

I want to loop through a database of documents and calculate a pairwise comparison score.

A simplistic, naive method would nest a loop within another loop. This would result in the program comparing documents twice and also comparing each document to itself.

Is there a name for the algorithm for doing this task efficiently? Is there a name for this approach?

Thanks.

Hogan · Accepted Answer

Assume all items have a number ItemNumber

Simple solution -- always have the 2nd element's ItemNumber greater than the first item.

eg

for (firstitem = 1 to maxitemnumber)
  for (seconditem = firstitemnumber+1 to maxitemnumber)
    compare(firstitem, seconditem)

visual note: if you think of the compare as a matrix (item number of one on one axis item of the other on the other axis) this looks at one of the triangles.

........
x.......
xx......
xxx.....
xxxx....
xxxxx...
xxxxxx..
xxxxxxx.

symcbean · Answer

I don't think its complicated enough to qualify for a name.

You can avoid duplicate pairs just by forcing a comparison on any value which might be different between different rows - the primary key is an obvious choice, e.g.

Unique pairings:

SELECT a.item as a_item, b.item as b_item
FROM table AS a, table AS b
WHERE a.id<b.id

Potentially there are a lot of ways in which the the comparison operation can be used to generate data summmaries and therefore identify potentially similar items - for single words the soundex is an obvious choice - however you don't say what your comparison metric is.

C.

Compare DB row values efficiently

Tags:

python

database

mysql

logic

seanieb

2 Answers

Hogan

symcbean

Recent Activity

Donate For Us

Compare DB row values efficiently

Tags:

python

database

mysql

logic

seanieb

2 Answers

Hogan

symcbean

Related questions

Recent Activity

Donate For Us