I have two models, Item
and ItemGroup
:
class ItemGroup(models.Model):
group_name = models.CharField(max_length=50)
# fields..
class Item(models.Model):
item_name = models.CharField(max_length=50)
item_group = models.ForeignKey(ItemGroup, on_delete=models.CASCADE)
# other fields..
I want to write a serializer that will fetch all item groups with their item list as a nested array.
So I want this output:
[ {group_name: "item group name", "items": [... list of items ..] }, ... ]
As I see, I should write this with django rest framework:
class ItemGroupSerializer(serializers.ModelSerializer):
class Meta:
model = ItemGroup
fields = ('item_set', 'group_name')
Means, I have to write a serializer for ItemGroup
(not for Item
).
To avoid many queries I pass this queryset:
ItemGroup.objects.filter(**filters).prefetch_related('item_set')
The problem that I see is, for a large dataset, prefetch_related
results in an extra query with a VERY large sql IN
clause, which I could avoid with the query on the Item objects instead:
Item.objects.filter(**filters).select_related('item_group')
Which results in a JOIN which is way better.
Is it possible to query Item
instead of ItemGroup
, and yet to have the same serialization output?
The HyperlinkedModelSerializer class is similar to the ModelSerializer class except that it uses hyperlinks to represent relationships, rather than primary keys. By default the serializer will include a url field instead of a primary key field.
It is not necessary to use a serializer. You can do what you would like to achieve in a view. However, serializers help you a lot. If you don't want to use serializer, you can inherit APIView at a function-based-view.
In function-based views, we can pass extra context to serializer with “context” parameter with a dictionary. To access the extra context data inside the serializer we can simply access it with “self. context”. From example, to get “exclude_email_list” we just used code 'exclude_email_list = self.
Serializers in Django REST Framework are responsible for converting objects into data types understandable by javascript and front-end frameworks. Serializers also provide deserialization, allowing parsed data to be converted back into complex types, after first validating the incoming data.
Using prefetch_related
you will have two queries + the big IN clauses issue, although it is proven and portable.
I would give a solution that is more an example, based on your field names. It will create a function that transform from a serializer for Item
using your select_related
queryset
. It will override the list function of the view and transform from one serializer data to the other one that will give you the representation you want. It will use only one query and parsing the results will be in O(n)
so it should be fast.
You might need to refactor get_data
in order to add more fields to your results.
class ItemSerializer(serializers.ModelSerializer):
group_name = serializers.CharField(source='item_group.group_name')
class Meta:
model = Item
fields = ('item_name', 'group_name')
class ItemGSerializer(serializers.Serializer):
group_name = serializers.CharField(max_length=50)
items = serializers.ListField(child=serializers.CharField(max_length=50))
In the view:
class ItemGroupViewSet(viewsets.ModelViewSet):
model = models.Item
serializer_class = serializers.ItemSerializer
queryset = models.Item.objects.select_related('item_group').all()
def list(self, request, *args, **kwargs):
queryset = self.filter_queryset(self.get_queryset())
page = self.paginate_queryset(queryset)
if page is not None:
serializer = self.get_serializer(page, many=True)
data = self.get_data(serializer.data)
s = serializers.ItemGSerializer(data, many=True)
return self.get_paginated_response(s.data)
serializer = self.get_serializer(queryset, many=True)
data = self.get_data(serializer.data)
s = serializers.ItemGSerializer(data, many=True)
return Response(s.data)
@staticmethod
def get_data(data):
result, current_group = [], None
for elem in data:
if current_group is None:
current_group = {'group_name': elem['group_name'], 'items': [elem['item_name']]}
else:
if elem['group_name'] == current_group['group_name']:
current_group['items'].append(elem['item_name'])
else:
result.append(current_group)
current_group = {'group_name': elem['group_name'], 'items': [elem['item_name']]}
if current_group is not None:
result.append(current_group)
return result
Here is my result with my fake data:
[{
"group_name": "group #2",
"items": [
"first item",
"2 item",
"3 item"
]
},
{
"group_name": "group #1",
"items": [
"g1 #1",
"g1 #2",
"g1 #3"
]
}]
Let's start off with the basics
So this means that in order to get a serializer which can serialize a list of ItemGroup
and Item
objects in a nested representation, it has to be given that list in the first place. You've accomplished that so far using a query on the ItemGroup
model that calls prefetch_related
to get the related Item
objects. You've also identified that prefetch_related
triggers a second query to get those related objects, and this isn't satisfactory.
prefetch_related
is used to get multiple related objectsWhat does this mean exactly? When you are querying for a single object, like a single ItemGroup
, you use prefetch_related
to get a relationship containing multiple related objects, like a reverse foreign key (one-to-many) or a many-to-many relationship that's been defined. Django intentionally uses a second query to get these objects for a few reasons
select_related
is often non-performant when you force it to do a join against a second table. This is because a right outer join would be required in order to ensure that no ItemGroup
objects that do not contain an Item
are missed.prefetch_related
is an IN
on an indexed primary key field, which is one of the most performant queries out there.Item
objects it knows exist, so it can efficiently handle duplicates (in the case of many-to-many relationships) without having to do an additional subquery.All of this is a way to say: prefetch_related
is doing exactly what it should do, and it's doing that for a reason.
select_related
anywayAlright, alright. That's what was asked for, so let's see what can be done.
There are a few ways to accomplish this, all of which have their pros and cons and none of which work without some manual "stitching" work in the end. I am making the assumption that you aren't using the built-in ViewSet or generic views provided by DRF, but if you are then the stitching must happen in the filter_queryset
method to allow the built-in filtering to work. Oh, and it probably breaks pagination or makes it almost useless.
The original set of filters are being applied to the ItemGroup
object. And since this is being used in an API, these are probably dynamic and you don't want to lose them. So, you are going to need to apply filters through one of two ways:
Generate the filters and then prefix them with the related name
So you would generate your normal foo=bar
filters and then prefix them before passing it to filter()
so it'd be related__foo=bar
. This may have some performance implications since you're now filtering across relationships.
Generate the original subquery and then pass it to the Item
query directly
This is probably the "cleanest" solution, except you're generating an IN
query with comparable performance to the prefetch_related
one. Except it's worse performance, since this is treated as an uncacheable subquery instead.
Implementing both of these are realistically out of the scope of this question, since we want to be able to "flip and stitch" the Item
and ItemGroup
objects so the serializer works.
Item
query so you get a list of ItemGroup
objectsTaking the query given in the original question, where select_related
is being used to grab all of the ItemGroup
objects alongside the Item
objects, you are returned a queryset full of Item
objects. We actually want a list of ItemGroup
objects, since we're working with an ItemGroupSerializer
, so we're going to have to "flip it" around.
from collections import defaultdict
items = Item.objects.filter(**filters).select_related('item_group')
item_groups_to_items = defaultdict(list)
item_groups_by_id = {}
for item in items:
item_group = item.item_group
item_groups_by_id[item_group.id] = item_group
item_group_to_items[item_group.id].append(item)
I am intentionally using the id
of the ItemGroup
as the key for the dictionaries since most Django models are not immutable, and sometimes people override the hashing method to be something other than the primary key.
This will get you a mapping of ItemGroup
objects to their related Item
objects, which is ultimately what you need in order to "stitch" them together again.
ItemGroup
objects back with their related Item
objectsThis part isn't actually difficult to do, since you have all of the related objects already.
for item_group_id, item_group_items in item_group_to_items.items():
item_group = item_groups_by_id[item_group_id]
item_group.item_set = item_group_items
item_groups = item_groups_by_id.values()
This will get you all of the ItemGroup
objects that were requested and have them stored as list
in the item_groups
variable. Each ItemGroup
object will have the list of related Item
objects set in the item_set
attribute. You may want to rename this so it doesn't conflict with the automatically generated reverse foreign key of the same name.
From here, you can use it as you normally would in your ItemGroupSerializer
and it should work for serialization.
You can make this generic (and unreadable) pretty quickly, for use in other similar scenarios:
def flip_and_stitch(itmes, group_from_item, store_in):
from collections import defaultdict
item_groups_to_items = defaultdict(list)
item_groups_by_id = {}
for item in items:
item_group = getattr(item, group_from_item)
item_groups_by_id[item_group.id] = item_group
item_group_to_items[item_group.id].append(item)
for item_group_id, item_group_items in item_group_to_items.items():
item_group = item_groups_by_id[item_group_id]
setattr(item_group, store_in, item_group_items)
return item_groups_by_id.values()
And you'd just call this as
item_groups = flip_and_stitch(items, 'item_group', 'item_set')
Where:
items
is the queryset of items that you requested originally, with the select_related
call already applied.item_group
is the attribute on the Item
object where the related ItemGroup
is stored.item_set
is the attribute on the ItemGroup
object where the list of related Item
objects should be stored.If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With