Jump to edit to see more real-life code example, that doesn't work after changing the query order
Here are my models:
class ModelA(models.Model):
field_1a = models.CharField(max_length=32)
field_2a = models.CharField(max_length=32)
class ModelB(models.Model):
field_1b = models.CharField(max_length=32)
field_2b = models.CharField(max_length=32)
Now, create 2 instances each:
ModelA.objects.create(field_1a="1a1", field_2a="1a2")
ModelA.objects.create(field_1a="2a1", field_2a="2a2")
ModelB.objects.create(field_1b="1b1", field_2b="1b2")
ModelB.objects.create(field_1b="2b1", field_2b="2b2")
If I'll query for only one model with annotations, I get something like that:
>>> ModelA.objects.all().annotate(field1=F("field_1a"), field2=F("field_2a")).values("field1", "field2")
[{"field1": "1a1", "field2": "1a2"}, {"field1": "2a1", "field2": "2a2"}]
This is correct behavior. The problem starts, when I want to get union of those two models:
# model A first, with annotate
query = ModelA.objects.all().annotate(field1=F("field_1a"), field2=F("field_2a"))
# now union with model B, also annotated
query = query.union(ModelB.objects.all().annotate(field1=F("field_1b"), field2=F("field_2b")))
# get only field1 and field2
query = query.values("field1", "field2")
# the results are skewed:
assert list(query) == [
{"field1": 1, "field2": "1a1"},
{"field1": 1, "field2": "1b1"},
{"field1": 2, "field2": "2a1"},
{"field1": 2, "field2": "2b1"},
]
The assert passes correctly, which means that the results are wrong. It seems like the values()
didn't match the variable name, it just iterated over the object as on a tuple. The value of field1
is actually the object's ID, and field2
is field1
.
This is pretty easy to fix in such simple models, but my real models are quite complex, and they have a different number of fields. How do I union them correctly?
Below you can find an extended example that fails regardless of the order of union()
and values()
- the models are slightly bigger now, and it seems that the different fields count somehow confuses Django:
# models
class ModelA(models.Model):
field_1a = models.CharField(max_length=32)
field_1aa = models.CharField(max_length=32, null=True)
field_1aaa = models.CharField(max_length=32, null=True)
field_2a = models.CharField(max_length=32)
extra_a = models.CharField(max_length=32)
class ModelB(models.Model):
extra = models.CharField(max_length=32)
field_1b = models.CharField(max_length=32)
field_2b = models.CharField(max_length=32)
# test
ModelA.objects.create(field_1a="1a1", field_2a="1a2", extra_a="1extra")
ModelA.objects.create(field_1a="2a1", field_2a="2a2", extra_a="2extra")
ModelB.objects.create(field_1b="1b1", field_2b="1b2", extra="3extra")
ModelB.objects.create(field_1b="2b1", field_2b="2b2", extra="4extra")
values = ("field1", "field2", "extra")
query = (
ModelA.objects.all()
.annotate(
field1=F("field_1a"), field2=F("field_2a"), extra=F("extra_a")
)
.values(*values)
)
query = query.union(
ModelB.objects.all()
.annotate(field1=F("field_1b"), field2=F("field_2b"))
.values(*values)
)
# outcome
assert list(query) == [
{"field1": "1a1", "field2": "1a2", "extra": "1extra"},
{"field1": "2a1", "field2": "2a2", "extra": "2extra"},
{"field1": "3extra", "field2": "1b1", "extra": "1b2"},
{"field1": "4extra", "field2": "2b1", "extra": "2b2"},
]
After some debugging and going through the source code, I have an idea why this is happening. What I am going to do is try to explain that why doing annotate
+ values
results in displaying the id
and what is the difference between the two cases above.
To keep things simple, I will write also write the possible resulting sql query for each statement.
annotate
first but get values
on union queryqs1 = ModelA.objects.all().annotate(field1=F("field_1a"), field2=F("field_2a"))
When writing something like this, django will get all the fields + annotated fields, so the resulting sql query looks like:
select id, field_1a, field_2a, field_1a as field1, field_2a as field2 from ModelA
So, if we have a query
which is the result of:
qs = qs1.union(qs2)
the resulting sql for django looks like:
(select id, field_1a, field_2a, field_1a as field1, field_2a as field2 from ModelA)
UNION
(select id, field_1b, field_2b, field_1b as field1, field_2b as field2 from ModelB)
Let's go deeper into how this sql is generated. When we do a union
, a combinator
and combined_queries
is set on the qs.query
and the resulting sql is generated by combining the sql of individual queries. So, in summary:
qs.sql == qs1.sql UNION qs2.sql # in abstract sense
When, we do qs.values('field1', 'field2')
, the col_count
in compiler is set to 2 which is the number of fields. As you can see that the union query above returns 5 columns but in the final return from compiler each row in the results is sliced using col_count
. Now, this results
with only 2 columns is passed back to ValuesIterable
where it maps each name in the selected fields with the resulting columns. That is how it leads to the incorrect results.
annotate
+ values
on individual queries and then perform union
Now, let's see what happens when annotate
is used with values
directly
qs1 = ModelA.objects.all().annotate(field1=F("field_1a"), field2=F("field_2a")).values('field1', 'field2')
The resulting sql is:
select field_1a as field1, field_2a as field2 from ModelA
Now, when we do the union:
qs = qs1.union(qs2)
the sql is:
(select field_1a as field1, field_2a as field2 from ModelA)
UNION
(select field_1b as field1, field_2b as field2 from ModelB)
Now, when qs.values('field1', 'field2')
executes, the number of columns returned from union query has 2 columns which is same as the col_count
which is 2 and each field is matched with the individual columns producing the expected result.
In the OP, there is a scenario when even using .values
before union
doesn't produce correct results. The reason for that is that in the ModelB
, there is no annotation for extra
field.
So, let's look at the queries generated for each model:
ModelA.objects.all()
.annotate(
field1=F("field_1a"), field2=F("field_2a"), extra=F("extra_a")
)
.values(*values)
The SQL becomes:
select field_1a as field1, field_2a as field2, extra_a as extra from ModelA
For ModelB:
ModelB.objects.all()
.annotate(field1=F("field_1b"), field2=F("field_2b"))
.values(*values)
SQL:
select extra, field_1b as field1, field_2b as field2 from ModelB
and the union is:
(select field_1a as field1, field_2a as field2, extra_a as extra from ModelA)
UNION
(select extra, field_1b as field1, field_2b as field2 from ModelB)
Because annotated fields are listed after the real db fields, the extra
of ModelB
is mixed with field1
of ModelB
. TO make sure that you get correct results, please make sure that the ordering of fields in generated SQL is always correct - with or without annotation. In this case, I will suggest to annotate extra
on ModelB
as well.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With