When combining QuerySets, what's the difference between the QuerySet.union()
method and using the OR operator between QuerySets |
?
Consider the following 2 QuerySets:
qs1 = Candidate.objects.filter(id=1)
qs2 = Candidate.objects.filter(id=2)
How is qs1 | qs2
different from qs1.union(qs2)
? Is there some subtlety under the hood that I'm missing?
A QuerySet is a collection of data from a database. A QuerySet is built up as a list of objects. QuerySets makes it easier to get the data you actually need, by allowing you to filter and order the data.
Use union operator for queryset | to take union of two queryset. If both queryset belongs to same model / single model than it is possible to combine querysets by using union operator. One other way to achieve combine operation between two queryset is to use itertools chain function.
annotate()that has been computed over the objects that are related to the objects in the QuerySet . Each argument to annotate() is an annotation that will be added to each object in the QuerySet that is returned. The aggregation functions that are provided by Django are described in Aggregation Functions below.
The SQL UNION Operator. The UNION operator is used to combine the result-set of two or more SELECT statements. Every SELECT statement within UNION must have the same number of columns. The columns must also have similar data types. The columns in every SELECT statement must also be in the same order.
The following table contains the comparison query operators: It is used to match the values of the fields that are equal to a specified value. It is used to match all values of the field that are not equal to a specified value. It is used to match values of the fields that are greater than a specified value.
Hints the system how many concurrent subqueries of the union operator should be executed in parallel. Default: Amount of CPU cores on the single node of the cluster (2 to 16). Hints the system how many nodes should be used by the concurrent union subqueries execution.
Operands are the subexpressions or objects involved in an expression (Boolean or not) and connected by an operator. Boolean or logical operators are AND (logical AND or conjunction), OR (logical OR or disjunction), and NOT (logical NOT or negation). The keywords and, or, and not are the Python operators for these operations.
From the QuerySet API reference
union(), intersection(), and difference() return model instances of the type of the first QuerySet even if the arguments are QuerySets of other models.
The .union() Method return QuerySet with schema/column Names of only the first QuerySet Parameter Passed. Where as this is not the case with OR(|) Operator.
From the QuerySet API reference:
The UNION operator selects only distinct values by default. To allow duplicate values, use the all=True argument.
The .union()
method allows some granularity in specifying whether to keep or eliminate duplicate records returned. This choice is not available with the OR operator.
Also, QuerySets created by a .union()
call cannot have .distinct()
called on them.
This is more a SQL question than a Django question. In the example you post, the Django ORM will translate qs1 | qs2
as something along the lines of
SELECT * FROM candidate WHERE id=1 OR id=2
whereas in qs1.union(qs2)
it will be something like
SELECT * FROM candidate WHERE id=1 UNION SELECT * FROM candidate WHERE id=2
In this particular example there will be no difference, however I don't believe anyone would write it with UNION.
If you have an expensive query, there will be a difference in the timing when you choose one format over the other. You can use EXPLAIN to experiment. In some tests I make UNION takes way longer to give you the first row, but finishes a bit faster.
If query optimization is not an issue, it is more common to use OR.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With