I'm confused by an SQL query, and honestly, its one of those things that I'm not even sure how to google for. Thus StackOverflow.
I have what I think is a simple query.
SELECT Id
FROM Customer
WHERE Id IN (SELECT Id from @CustomersWithCancelledOrders)
Here's where I find the weirdness. There is no column called Id in the @CustomersWithCancelledOrders table variable. But there isn't an error.
What this results in is the Ids for all Customers. Every single one. Which obviously defeats the point of doing a sub-query in the first place.
It's like its using the Id column from the outer table (Customers), but I don't understand why it would do that. Is there ever a reason you would want to do that? Am I missing something incredibly obvious?
SQLFiddle of the weirdness. It's not the best SQL Fiddle, as I couldn't find a way to return multiple result sets on that website, but it demonstrates how I ran across the issue.
I suppose what I'm looking for is a name for the "feature" above, some sort of information about why it does what it does and what the incorrect query actually means.
I've updated the above question to use a slightly better example. Its still contrived, but its closer to the script I wrote when I actually encountered the issue.
After doing some reading on correlated subqueries, it looks like my typo (using the wrong Id column in the subquery) changes the behaviour of the subquery.
Instead of evaluating the results of the subquery once and then treating those results as a set (which was what I intended) it evaluates the subquery for every row in the outer query.
This means that the subquery evaluates to a set of different results for every row, and that set of results is guaranteed to have the customer Id of that row in it. The subquery returns a set consisting of the Id of the row repeated X number of times, where X is the number of rows in the table variable that is being selected from.
...
Its really hard to write down a concise description of my understanding of the issue. Sorry. I think I'm good now though.
The go to solution for removing duplicate rows from your result sets is to include the distinct keyword in your select statement. It tells the query engine to remove duplicates to produce a result set in which every row is unique.
Disadvantages of Subquery:The optimizer is more mature for MYSQL for joins than for subqueries, so in many cases a statement that uses a subquery can be executed more efficiently if you rewrite it as join. We cannot modify a table and select from the same table within a subquery in the same SQL statement.
Duplicate names within a single CTE definition aren't allowed. The number of column names specified must match the number of columns in the result set of the CTE_query_definition.
It's intended behaviour because in a sub query you can access the 'outer queries' column names. Meaning you can use Id from Table within the Subquery and the query therefore thinks you are using Id.
That's why you should qualify with aliases or fully qualified names when working with sub queries.
For example; check out
http://support.microsoft.com/kb/298674
SELECT ID
FROM [Table]
WHERE ID IN (SELECT OtherTable.ID FROM OtherTable)
This will generate an error. As Allan S. Hanses said, in the subquery you can use colums from the main query.
See this example
SELECT ID
FROM [Table]
WHERE ID IN (SELECT ID)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With