How can I compare each object with each and if ratio() > 0.7 set possible_duplicate=True
for both objects?
My try:
from difflib import SequenceMatcher
class Item(models.Model):
name = models.CharField(max_length=255)
desc = models.TextField()
possible_duplicate = models.BooleanField(default=False)
items = Item.objects.all()
for item in items:
obj = Item.objects.get(pk=item.pk)
similarity = SequenceMatcher(None, item.desc, obj.desc).ratio()
if similarity > 0.7:
item.possible_duplicate = True
item.save()
obj.possible_duplicate = True
obj.save()
You can use itertools.combinations
to get comparison combinations:
>>> import itertools
>>> items = [1, 2, 3]
>>> itertools.combinations(items, 2) # 2 -> yields tuples with 2 items
<itertools.combinations object at 0x7f5e456d5ba8>
>>> list(itertools.combinations(items, 2))
[(1, 2), (1, 3), (2, 3)]
import itertools
items = Item.objects.all()
for item1, item2 in itertools.combinations(items, 2):
similarity = SequenceMatcher(None, item1.desc, item2.desc).ratio()
if similarity > 0.7:
for item in item1, item2:
item.possible_duplicate = True
item.save()
In your code you are comparing the object to itself. To compare all objects with each other you can use itertools.combinations
items_list = list(Items.objects.all())
for a,b in itertools.combinations(items_list, 2):
similarity = SequenceMatcher(None, a.desc, b.desc).ratio()
if similarity > 0.7:
a.possible.duplicate = True
a.save()
b.possible.duplicate = True
b.save()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With