Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Django Serializer Nested Creation: How to avoid N+1 queries on relations

There are dozens of posts about n+1 queries in nested relations in Django, but I can't seem to find the answer to my question. Here's the context:

The Models

class Book(models.Model):
    title = models.CharField(max_length=255)

class Tag(models.Model):
    book = models.ForeignKey('app.Book', on_delete=models.CASCADE, related_name='tags')
    category = models.ForeignKey('app.TagCategory', on_delete=models.PROTECT)
    page = models.PositiveIntegerField()

class TagCategory(models.Model):
    title = models.CharField(max_length=255)
    key = models.CharField(max_length=255)

A book has many tags, each tag belongs to a tag category.

The Serializers

class TagSerializer(serializers.ModelSerializer):
    class Meta:
        model = Tag
        exclude = ['id', 'book']

class BookSerializer(serializers.ModelSerializer):
    tags = TagSerializer(many=True, required=False)

    class Meta:
        model = Book
        fields = ['title', 'tags']

    def create(self, validated_data):
        with transaction.atomic():
            tags = validated_data.pop('tags')
            book = Book.objects.create(**validated_data)
            Tag.objects.bulk_create([Tag(book=book, **tag) for tag in tags])
        return book

The Problem

I am trying to POST to the BookViewSet with the following example data:

{ 
  "title": "The Jungle Book"
  "tags": [
    { "page": 1, "category": 36 }, // plot intro
    { "page": 2, "category": 37 }, // character intro
    { "page": 4, "category": 37 }, // character intro
    // ... up to 1000 tags
  ]
}

This all works, however, during the post, the serializer proceeds to make a call for each tag to check if the category_id is a valid one:

enter image description here

With up to 1000 nested tags in a call, I can't afford this.
How do I "prefetch" for the validation?
If this is impossible, how do I turn off the validation that checks if a foreign_key id is in the database?

EDIT: Additional Info

Here is the view:

class BookViewSet(views.APIView):

    queryset = Book.objects.all().select_related('tags', 'tags__category')
    permission_classes = [IsAdminUser]

    def post(self, request, format=None):
        serializer = BookSerializer(data=request.data)
        if serializer.is_valid():
            serializer.save()
            return Response(serializer.data, status=status.HTTP_201_CREATED)
        return Response(serializer.errors, status=status.HTTP_400_BAD_REQUEST)
like image 586
jbodily Avatar asked Nov 24 '18 23:11

jbodily


1 Answers

The DRF serializer is not the place (in my own opinion) to optimize a DB query. Serializer has 2 jobs:

  1. Serialize and check the validity of input data.
  2. Serialize output data.

Therefore the correct place to optimize your query is the corresponding view.
We will use the select_related method that:

Returns a QuerySet that will “follow” foreign-key relationships, selecting additional related-object data when it executes its query. This is a performance booster which results in a single more complex query but means later use of foreign-key relationships won’t require database queries. to avoid the N+1 database queries.

You will need to modify the part of your view code that creates the corresponding queryset, in order to include a select_related call.
You will also need to add a related_name to the Tag.category field definition.

Example:

# In your Tag model:
category = models.ForeignKey(
    'app.TagCategory', on_delete=models.PROTECT, related_name='categories'
)

# In your queryset defining part of your View:
class BookViewSet(views.APIView):

    queryset = Book.objects.all().select_related(
        'tags', 'tags__categories'
    )  # We are using the related_name of the ForeignKey relationships.

If you want to test something different that uses also the serializer to cut down the number of queries, you can check this article.

like image 111
John Moutafis Avatar answered Oct 19 '22 13:10

John Moutafis