Can anyone tell me why when I add the order_by()
the query that gets output changes from a INNER JOIN
to an LEFT OUTER JOIN
?
Is there any way to preserve the INNER JOIN
-ness?
data = models.RetailSalesFact.objects.values('customer_key__customer_state',
'date_key__calendar_month_name')
data = data.filter(date_key__calendar_year=year)
data = data.annotate(sales=Sum('sales_quantity'))
data = data.order_by('date_key__calendar_month_name')
Before:
SELECT Customer_Dimension.Customer_State, Date_Dimension.Calendar_Month_Name,
SUM(Retail_Sales_Fact.Sales_Quantity) AS sales
FROM Retail_Sales_Fact
INNER JOIN Customer_Dimension
ON (Retail_Sales_Fact.Customer_Key = Customer_Dimension.Customer_Key)
INNER JOIN Date_Dimension
ON (Retail_Sales_Fact.Date_Key = Date_Dimension.Date_Key)
WHERE Date_Dimension.Calendar_Year = ?
GROUP BY Customer_Dimension.Customer_State,
Date_Dimension.Calendar_Month_Name
ORDER BY Date_Dimension.Calendar_Month_Name ASC
After:
SELECT Customer_Dimension.Customer_State, Date_Dimension.Calendar_Month_Name,
SUM(Retail_Sales_Fact.Sales_Quantity) AS sales
FROM Retail_Sales_Fact
INNER JOIN Customer_Dimension
ON (Retail_Sales_Fact.Customer_Key = Customer_Dimension.Customer_Key)
LEFT OUTER JOIN Date_Dimension
ON (Retail_Sales_Fact.Date_Key = Date_Dimension.Date_Key)
WHERE Date_Dimension.Calendar_Year = ?
GROUP BY Customer_Dimension.Customer_State,
Date_Dimension.Calendar_Month_Name
ORDER BY Date_Dimension.Calendar_Month_Name ASC
You're putting a filter on the outer table (date_dimension__calendar_year=year), so there will be no difference between the result set, whether you use an inner join or a left outer join.
The order-by is processed on an intermediate result-set -- if done on the inner-joined tables, then it has to be done after the tables are combined -- which means read one: combine records; read two: order the combined records.
But if the order-by is done on only the outer-joined table, which in this case is all you're asking for, then your query optimizer may be able be able to avoid reading the entire set twice, and instead, only read the outer table twice. Your optimizer might recognize this as a savings in terms of processing power.
It's just a guess. Your result set should turn-out the same either way. I wonder if you could time it both ways, and see which one takes longer.
I'd guess the ORM is doing a LEFT JOIN
because it can't tell if the INNER JOIN
's where restriction is more or less restrictive than the ordering clause. Since it thinks it needs to order every record, regardless of whether it matches or not.
You can force an INNER JOIN
by using Raw SQL. Or maybe you can fool the ORM by applying the order_by
before the filter
?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With