Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficiency of joining subqueries in SQL Server

I have a customers and orders table in SQL Server 2008 R2. Both have indexes on the customer id (called id). I need to return details about all customers in the customers table and information from the orders table, such as details of the first order.

I currently left join my customers table on a subquery of the orders table, with the subquery returning the information I need about the orders. For example:

SELECT c.id
        ,c.country      
        ,First_orders.product
        ,First_orders.order_id
FROM customers c

LEFT JOIN   SELECT( id, 
                    product 
            FROM (SELECT    id
                            ,product
                            ,order_id
                            ,ROW_NUMBER() OVER (PARTITION BY id ORDER BY Order_Date asc) as order_No 
                        FROM orders) orders
            WHERE Order_no = 1) First_Orders
ON c.id = First_orders.id

I'm quite new to SQL and want to understand if I'm doing this efficiently. I end up left joining quite a few subqueries like this onto the customers table in one select query and it can take tens of minutes to run.

So am I doing this efficiently or can it be improved? For example, I'm not sure if my index on id in the orders table is of any use and maybe I could speed up the query by creating a temporary table of what is in the subquery first and creating a unique index on id in the temporary table so SQL Server knows id is now a unique column and then joining my customers table to this temporary table? I typically have one or two million rows in the customers and orders tables.

Many thanks in advance!

like image 725
user2112153 Avatar asked Feb 26 '13 17:02

user2112153


2 Answers

You can remove one of your subqueries to make it a little more efficient:

SELECT c.id
        ,c.country      
        ,First_orders.product
        ,First_orders.order_id
FROM customers c
   LEFT JOIN  (SELECT id
                    ,product
                    ,order_id
                    ,ROW_NUMBER() OVER (PARTITION BY id ORDER BY Order_Date asc) as order_No 
               FROM orders) First_Orders
     ON c.id = First_orders.id AND First_Orders.order_No = 1

In your above query, you need to be careful where you place your parentheses as I don't think it will work. Also, you're returning product in your results, but not including in your nested subquery.

like image 80
sgeddes Avatar answered Oct 22 '22 04:10

sgeddes


For someone who is just learning SQL, your query looks pretty good.

The index on customers may or may not be used for the query -- you would need to look at the execution plan. An index on orders(id, order_date) could be used quite effectively for the row_number function.

One comment is on the naming of fields. The field orders.id should not be the customer id. That should be something like 'orders.Customer_Id`. Keeping the naming system consistent across tables will help you in the future.

like image 42
Gordon Linoff Avatar answered Oct 22 '22 05:10

Gordon Linoff