Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Django order_by causes LEFT JOIN

Can anyone tell me why when I add the order_by() the query that gets output changes from a INNER JOIN to an LEFT OUTER JOIN?

Is there any way to preserve the INNER JOIN-ness?

data = models.RetailSalesFact.objects.values('customer_key__customer_state', 
                                             'date_key__calendar_month_name')
data = data.filter(date_key__calendar_year=year)
data = data.annotate(sales=Sum('sales_quantity'))
data = data.order_by('date_key__calendar_month_name')

Before:

SELECT Customer_Dimension.Customer_State, Date_Dimension.Calendar_Month_Name,
       SUM(Retail_Sales_Fact.Sales_Quantity) AS sales 
    FROM Retail_Sales_Fact  
    INNER JOIN Customer_Dimension 
        ON (Retail_Sales_Fact.Customer_Key = Customer_Dimension.Customer_Key) 
    INNER JOIN Date_Dimension 
        ON (Retail_Sales_Fact.Date_Key = Date_Dimension.Date_Key) 
    WHERE Date_Dimension.Calendar_Year = ?  
    GROUP BY Customer_Dimension.Customer_State, 
             Date_Dimension.Calendar_Month_Name 
    ORDER BY Date_Dimension.Calendar_Month_Name ASC

After:

SELECT Customer_Dimension.Customer_State, Date_Dimension.Calendar_Month_Name, 
       SUM(Retail_Sales_Fact.Sales_Quantity) AS sales 
    FROM Retail_Sales_Fact 
    INNER JOIN Customer_Dimension 
        ON (Retail_Sales_Fact.Customer_Key = Customer_Dimension.Customer_Key) 
    LEFT OUTER JOIN Date_Dimension 
        ON (Retail_Sales_Fact.Date_Key = Date_Dimension.Date_Key) 
    WHERE Date_Dimension.Calendar_Year = ?  
    GROUP BY Customer_Dimension.Customer_State, 
             Date_Dimension.Calendar_Month_Name 
    ORDER BY Date_Dimension.Calendar_Month_Name ASC
like image 436
Al W Avatar asked May 09 '11 23:05

Al W


2 Answers

You're putting a filter on the outer table (date_dimension__calendar_year=year), so there will be no difference between the result set, whether you use an inner join or a left outer join.

The order-by is processed on an intermediate result-set -- if done on the inner-joined tables, then it has to be done after the tables are combined -- which means read one: combine records; read two: order the combined records.

But if the order-by is done on only the outer-joined table, which in this case is all you're asking for, then your query optimizer may be able be able to avoid reading the entire set twice, and instead, only read the outer table twice. Your optimizer might recognize this as a savings in terms of processing power.

It's just a guess. Your result set should turn-out the same either way. I wonder if you could time it both ways, and see which one takes longer.

like image 67
Chains Avatar answered Sep 16 '22 12:09

Chains


I'd guess the ORM is doing a LEFT JOIN because it can't tell if the INNER JOIN's where restriction is more or less restrictive than the ordering clause. Since it thinks it needs to order every record, regardless of whether it matches or not.

You can force an INNER JOIN by using Raw SQL. Or maybe you can fool the ORM by applying the order_by before the filter?

like image 42
Leopd Avatar answered Sep 18 '22 12:09

Leopd