Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Optimizing database queries in Django REST framework

Tags:

I have the following models:

class User(models.Model):     name = models.Charfield()     email = models.EmailField()  class Friendship(models.Model):     from_friend = models.ForeignKey(User)     to_friend = models.ForeignKey(User) 

And those models are used in the following view and serializer:

class GetAllUsers(generics.ListAPIView):     authentication_classes = (SessionAuthentication, TokenAuthentication)     permission_classes = (permissions.IsAuthenticated,)     serializer_class = GetAllUsersSerializer     model = User      def get_queryset(self):         return User.objects.all()  class GetAllUsersSerializer(serializers.ModelSerializer):      is_friend_already = serializers.SerializerMethodField('get_is_friend_already')      class Meta:         model = User         fields = ('id', 'name', 'email', 'is_friend_already',)      def get_is_friend_already(self, obj):         request = self.context.get('request', None)          if request.user != obj and Friendship.objects.filter(from_friend = user):             return True         else:             return False 

So basically, for each user returned by the GetAllUsers view, I want to print out whether the user is a friend with the requester (actually I should check both from_ and to_friend, but does not matter for the question in point)

What I see is that for N users in database, there is 1 query for getting all N users, and then 1xN queries in the serializer's get_is_friend_already

Is there a way to avoid this in the rest-framework way? Maybe something like passing a select_related included query to the serializer that has the relevant Friendship rows?

like image 731
dowjones123 Avatar asked Oct 27 '14 17:10

dowjones123


People also ask

Why Django queries are slow?

If you're experiencing slowness with the second line, the problem is eitherwith the actual execution of the query, or with the display\printing of the data. You can force-execute the query without printing it (check the documentation) to find out which one it is.

How does Django handle large data?

Use bulk query. Use bulk queries to efficiently query large data sets and reduce the number of database requests. Django ORM can perform several inserts or update operations in a single SQL query. If you're planning on inserting more than 5000 objects, specify batch_size.


1 Answers

Django REST Framework cannot automatically optimize queries for you, in the same way that Django itself won't. There are places you can look at for tips, including the Django documentation. It has been mentioned that Django REST Framework should automatically, though there are some challenges associated with that.

This question is very specific to your case, where you are using a custom SerializerMethodField that makes a request for each object that is returned. Because you are making a new request (using the Friends.objects manager), it is very difficult to optimize the query.

You can make the problem better though, by not creating a new queryset and instead getting the friend count from other places. This will require a backwards relation to be created on the Friendship model, most likely through the related_name parameter on the field, so you can prefetch all of the Friendship objects. But this is only useful if you need the full objects, and not just a count of the objects.

This would result in a view and serializer similar to the following:

class Friendship(models.Model):     from_friend = models.ForeignKey(User, related_name="friends")     to_friend = models.ForeignKey(User)  class GetAllUsers(generics.ListAPIView):     ...      def get_queryset(self):         return User.objects.all().prefetch_related("friends")  class GetAllUsersSerializer(serializers.ModelSerializer):     ...      def get_is_friend_already(self, obj):         request = self.context.get('request', None)          friends = set(friend.from_friend_id for friend in obj.friends)          if request.user != obj and request.user.id in friends:             return True         else:             return False 

If you just need a count of the objects (similar to using queryset.count() or queryset.exists()), you can include annotate the rows in the queryset with the counts of reverse relationships. This would be done in your get_queryset method, by adding .annotate(friends_count=Count("friends")) to the end (if the related_name was friends), which will set the friends_count attribute on each object to the number of friends.

This would result in a view and serializer similar to the following:

class Friendship(models.Model):     from_friend = models.ForeignKey(User, related_name="friends")     to_friend = models.ForeignKey(User)  class GetAllUsers(generics.ListAPIView):     ...      def get_queryset(self):         from django.db.models import Count          return User.objects.all().annotate(friends_count=Count("friends"))  class GetAllUsersSerializer(serializers.ModelSerializer):     ...      def get_is_friend_already(self, obj):         request = self.context.get('request', None)          if request.user != obj and obj.friends_count > 0:             return True         else:             return False 

Both of these solutions will avoid N+1 queries, but the one you pick depends on what you are trying to achieve.

like image 163
Kevin Brown-Silva Avatar answered Oct 10 '22 11:10

Kevin Brown-Silva