I have a set of Django models as shown in the following diagram (the names of the reverse relationships are shown in the yellow bubbles):
(source: cbstaff.com)
In each relationship, a Person
may have 0 or more of the items.
Additionally, the slug
field is (unfortunately) not unique; multiple Person
records may have the same slug fields. Essentially these records are duplicates.
I want to obtain a list of all records that meet the following criteria: All duplicate records (that is, having the same slug) with at least one Entry
OR at least one Audio
OR at least one Episode
OR at least one Article
.
So far, I have the following query:
Person.objects.values('slug').annotate(num_records=Count('slug')).filter(num_records__gt=1)
This groups all records by slug
, then adds a num_records
attribute that says how many records have that slug, but the additional filtering is not performed (and I don't even know if this would work right anyway, since, given a set of duplicate records, one may have, e.g., and Entry
and the other may have an Article
).
In a nutshell, I want to find all duplicate records and collapse them, along with their associated models, into one record.
What's the best way to do this with Django?
To filter a Python Django query with a list of values, we can use the filter method with in . to search Blog entries with pk set to 1,4 or 7 by calling Blog. objects. filter with the pk_in argument set to [1, 4, 7] .
Nope. Django filters operate at the database level, generating SQL. To filter based on Python properties, you have to load the object into Python to evaluate the property--and at that point, you've already done all the work to load it.
filter(tags__in=tags) matches photos that have any of the tags, not only those that has all. Some of those that only has one of the desired tags, may have exactly the amount of tags that you are looking for, and some of those that has all the desired tags, may also have additional tags.
Django-filter is a generic, reusable application to alleviate writing some of the more mundane bits of view code. Specifically, it allows users to filter down a queryset based on a model's fields, displaying the form to let them do this.
I would do this in several queries. The first is your list of duplicates, that you have:
dupes = [p['slug'] for p in Person.objects.values('slug').annotate(num_records=Count('slug')).filter(num_records__gt=1)]
I would then loop through these, and for each one decide on which to keep (make an arbitrary decision - pick the first one). Then, for all the other primary keys, just update all the other objects to point to the primary key you have selected:
for slug in dupes:
pks = [p.id for p in Person.objects.filter(slug=slug)]
for pk in pks[1:]:
Audio.objects.filter(person=pk).update(person=pks[0])
Author.objects.filter(person=pk).update(person=pks[0])
Episode.objects.filter(person=pk).update(person=pks[0])
Entry.objects.filter(person=pk).update(person=pks[0])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With