Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Django: duplicates when filtering on many to many field

I've got the following models in my Django app:

class Book(models.Model):
    name = models.CharField(max_length=100)
    keywords = models.ManyToManyField('Keyword')

class Keyword(models.Model)
    name = models.CharField(max_length=100)

I've got the following keywords saved:

science-fiction
fiction
history
science
astronomy

On my site a user can filter books by keyword, by visiting /keyword-slug/. The keyword_slug variable is passed to a function in my views, which filters Books by keyword as follows:

def get_books_by_keyword(keyword_slug):
    books = Book.objects.all()
    keywords = keyword_slug.split('-')
    for k in keywords:
        books = books.filter(keywords__name__icontains=k)

This works for the most part, however whenever I filter with a keyword that contains a string that appears more than once in the keywords table (e.g. science-fiction and fiction), then I get the same book appear more than once in the resulting QuerySet.

I know I can add distinct to only return unique books, but I'm wondering why I'm getting duplicates to begin with, and really want to understand why this works the way it does. Since I'm only calling filter() on successfully filtered QuerySets, how does the duplicate book get added to the results?

like image 461
rolling stone Avatar asked Aug 06 '13 03:08

rolling stone


2 Answers

The 2 models in your example are represented with 3 tables: book, keyword and book_keyword relation table to manage M2M field.

When you use keywords__name in filter call Django is using SQL JOIN to merge all 3 tables. This allows you to filter objects in 1st table by values from another table.

The SQL will be like this:

SELECT `book`.`id`,
       `book`.`name`
FROM `book`
INNER JOIN `book_keyword` ON (`book`.`id` = `book_keyword`.`book_id`)
INNER JOIN `keyword` ON (`book_keyword`.`keyword_id` = `keyword`.`id`)
WHERE (`keyword`.`name` LIKE %fiction%)

After JOIN your data looks like

| Book Table          | Relation table                     | Keyword table                |
|---------------------|------------------------------------|------------------------------|
| Book ID | Book name | relation_book_id | relation_key_id | Keyword ID | Keyword name    |
|---------|-----------|------------------|-----------------|------------|-----------------|
| 1       | Book 1    | 1                | 1               | 1          | Science-fiction |
| 1       | Book 1    | 1                | 2               | 2          | Fiction         |
| 2       | Book 2    | 2                | 2               | 2          | Fiction         |

Then when data is loaded from DB into Python you only receive data from book table. As you can see the Book 1 is duplicated there

This is how Many-to-many relation and JOIN works

like image 99
Igor Avatar answered Sep 18 '22 03:09

Igor


Direct quote from the Docs: https://docs.djangoproject.com/en/dev/topics/db/queries/#spanning-multi-valued-relationships

Successive filter() calls further restrict the set of objects, but for multi-valued relations, they apply to any object linked to the primary model, not necessarily those objects that were selected by an earlier filter() call.

In your case, because keywords is a multi-valued relation, your chain of .filter() calls filters based only on the original model and not on the previous queryset.

like image 44
Derek Kwok Avatar answered Sep 20 '22 03:09

Derek Kwok