So I have a simple table called temp that can be created by: <pre class="prettyprint"><code>CREATE TABLE temp (value int, id int not null primary key); INSERT INTO temp VALUES(0,1), (0,2), (0,3), (0,4), (1,5), (1,6), (1,7), (1,8); </code></pre> I have a second table temp2 that can be created by: <pre class="prettyprint"><code>CREATE TABLE temp (value int, id int); INSERT INTO temp VALUES(0,1), (0,2), (0,3), (0,4), (1,5), (1,6), (1,7), (1,8); </code></pre> The only difference between temp and temp2 is that the id field is the primary key in temp, and temp2 has no primary key. I'm not sure how, but I am getting differing results with the following query: <pre class="prettyprint"><code>select * from temp where id in ( select id from ( select id, ROW_NUMBER() over (partition by value order by value) rownum from temp ) s1 where rownum = 1 ) </code></pre> This is the result for temp: <pre class="prettyprint"><code>value id ----------- ----------- 0 1 0 2 0 3 0 4 1 5 1 6 1 7 1 8 </code></pre> and this is what I get when temp is replaced by temp2 (THE CORRECT RESULT): <pre class="prettyprint"><code>value id ----------- ----------- 0 1 1 5 </code></pre> When running the inner-most query (s1), the expected results are retrieved: <pre class="prettyprint"><code>id rownum ----------- -------------------- 1 1 2 2 3 3 4 4 5 1 6 2 7 3 8 4 </code></pre> When just running the <code>in</code> statement query on both, I also get the expected result: <pre class="prettyprint"><code>id ----------- 1 5 </code></pre> I cannot figure out what the reason for this could possibly be. Is this a bug? Notes: temp2 was created with a simple <code>select * into temp2 from temp</code>. I am running SQL Server 2008. My apologies if this is a known glitch. It is difficult to search for this since it requires an <code>in</code> statement. An "equivalent" query that uses a join does produce the correct results on both tables. Edit: dbfiddle showing the differences: Unexpected Results Expected Results

I can't specifically answer your question, but changing the <code>ORDER BY</code> fixes the problem. <code>partition by value order by value</code> doesn't really make sense, and it looks like the problem is "fooling" SQL Server; as you're partitioning the rows by the same value you're ordering by, every row is "row number 1" as they could all be at the start. Don't forget, a table is an unordered heap; even when it has a Primary Key (clustered or not). If you change your <code>ORDER BY</code> to <code>id</code> instead the problem goes away. <pre class="prettyprint"><code>SELECT * FROM temp2 t2 WHERE t2.id IN (SELECT s1.id FROM (SELECT sq.id, ROW_NUMBER() OVER (PARTITION BY sq.value ORDER BY sq.id) AS rownum FROM temp2 sq) s1 WHERE s1.rownum = 1); </code></pre> In fact, changing the <code>ORDER BY</code> clause to anything else fixes the problem: <pre class="prettyprint"><code>SELECT * FROM temp2 t2 WHERE t2.id IN (SELECT s1.id FROM (SELECT sq.id, ROW_NUMBER() OVER (PARTITION BY sq.value ORDER BY (SELECT NULL)) AS rownum FROM temp2 sq) s1 WHERE s1.rownum = 1); </code></pre> So the problem is that your are using the same expression (column) for both your <code>PARTITION BY</code> and <code>ORDER BY</code> clause; meaning that any of those rows could be row number 1, and none of them; thus all are returned. It doesn't make sense for both to be the same, so they should be different. Still, this problem does persist in SQL Server 2017 (and I suspect 2019) so you might want to raise a support ticket with them anyway (but as you're using 2008 don't expect it to get fixed, as your support is about to end). As comments can be deleted without notice I wanted to add @scsimon's comment and my response: <blockquote> scsimon: Interesting. Changing <code>rownum = 2</code> gives expected results without changing <code>order by</code>. I think it's a bug. Larnu: I agree at @scsimon. I suspect that changing the <code>WHERE</code> to <code>s1.rownum = 2</code> effectively forces the data engine to actually determine the values of <code>rownum</code>, rather than assume every row is "equal"; as if that were the case none would be returned. Even so, changing the <code>WHERE</code> to <code>s1.rownum = 2</code> is still resigning to "return a random row", if the <code>PARTITION BY</code> and <code>ORDER BY</code> clauses are the same </blockquote>

IN statement inconsistency with PRIMARY KEY

Tags:

sql-server

sql-server-2008

primary-key

So I have a simple table called temp that can be created by:

CREATE TABLE temp (value int, id int not null primary key);
INSERT INTO temp
VALUES(0,1),
      (0,2),
      (0,3),
      (0,4),
      (1,5),
      (1,6),
      (1,7),
      (1,8);

I have a second table temp2 that can be created by:

CREATE TABLE temp (value int, id int);
INSERT INTO temp
VALUES(0,1),
      (0,2),
      (0,3),
      (0,4),
      (1,5),
      (1,6),
      (1,7),
      (1,8);

The only difference between temp and temp2 is that the id field is the primary key in temp, and temp2 has no primary key. I'm not sure how, but I am getting differing results with the following query:

select * from temp
where id in (
    select id
    from (
        select id, ROW_NUMBER() over (partition by value order by value) rownum
        from temp
    ) s1
    where rownum = 1
)

This is the result for temp:

value       id
----------- -----------
0           1
0           2
0           3
0           4
1           5
1           6
1           7
1           8

and this is what I get when temp is replaced by temp2 (THE CORRECT RESULT):

value       id
----------- -----------
0           1
1           5

When running the inner-most query (s1), the expected results are retrieved:

id          rownum
----------- --------------------
1           1
2           2
3           3
4           4
5           1
6           2
7           3
8           4

When just running the in statement query on both, I also get the expected result:

id
-----------
1
5

I cannot figure out what the reason for this could possibly be. Is this a bug?

Notes: temp2 was created with a simple select * into temp2 from temp. I am running SQL Server 2008. My apologies if this is a known glitch. It is difficult to search for this since it requires an in statement. An "equivalent" query that uses a join does produce the correct results on both tables.

Edit: dbfiddle showing the differences: Unexpected Results Expected Results

311

asked Jun 03 '19 20:06

Jason

1 Answers

I can't specifically answer your question, but changing the ORDER BY fixes the problem. partition by value order by value doesn't really make sense, and it looks like the problem is "fooling" SQL Server; as you're partitioning the rows by the same value you're ordering by, every row is "row number 1" as they could all be at the start. Don't forget, a table is an unordered heap; even when it has a Primary Key (clustered or not).

If you change your ORDER BY to id instead the problem goes away.

SELECT *
FROM temp2 t2
WHERE t2.id IN (SELECT s1.id
                FROM (SELECT sq.id,
                             ROW_NUMBER() OVER (PARTITION BY sq.value ORDER BY sq.id) AS rownum
                      FROM temp2 sq) s1
                WHERE s1.rownum = 1);

In fact, changing the ORDER BY clause to anything else fixes the problem:

SELECT *
FROM temp2 t2
WHERE t2.id IN (SELECT s1.id
                FROM (SELECT sq.id,
                             ROW_NUMBER() OVER (PARTITION BY sq.value ORDER BY (SELECT NULL)) AS rownum
                      FROM temp2 sq) s1
                WHERE s1.rownum = 1);

So the problem is that your are using the same expression (column) for both your PARTITION BY and ORDER BY clause; meaning that any of those rows could be row number 1, and none of them; thus all are returned. It doesn't make sense for both to be the same, so they should be different.

Still, this problem does persist in SQL Server 2017 (and I suspect 2019) so you might want to raise a support ticket with them anyway (but as you're using 2008 don't expect it to get fixed, as your support is about to end).

As comments can be deleted without notice I wanted to add @scsimon's comment and my response:

scsimon: Interesting. Changing rownum = 2 gives expected results without changing order by. I think it's a bug.

Larnu: I agree at @scsimon. I suspect that changing the WHERE to s1.rownum = 2 effectively forces the data engine to actually determine the values of rownum, rather than assume every row is "equal"; as if that were the case none would be returned.
Even so, changing the WHERE to s1.rownum = 2 is still resigning to "return a random row", if the PARTITION BY and ORDER BY clauses are the same

116

answered Sep 25 '22 11:09

Larnu

Related questions
                            
                                SqlDataReader and SQL Server 2016 FOR JSON splits json in chunks of 2k bytes
                            
                                Drop index in SQL Server
                            
                                R: Best Practices - dplyr and odbc multi table actions (retrieved from SQL)
                            
                                SQL71501 - How to Suppress this Error
                            
                                Connect Metabase to Sql Server
                            
                                Use join column only once in SELECT * [duplicate]
                            
                                MS SQL - Query takes too long to run
                            
                                TRY_PARSE vs TRY_CONVERT
                            
                                How can I auto increment the primary key in SQL Server 2016 merge insert without sequences?
                            
                                Transaction Deadlocks on SQL Server Insert with EF Core
                            
                                SQL Server : view MUCH slower than same query by itself
                            
                                How do I get EntityFrameworkCore generated SQL to use the right format for DateTime objects?
                            
                                SQL Server - Is there any way to recover a corrupted database without backup? [closed]
                            
                                Recommended way to read data from an Excel file using an Azure App Service?
                            
                                Get Lead value over multiple partitions
                            
                                SQL Server and allocate time into hour parts
                            
                                How to remove the square brackets in child rows of a json from SQL Server 2016?
                            
                                OData Destination in SSIS
                            
                                characters appearing incorrectly even with Unicode source and destination (SSIS)
                            
                                Problem in exporting SSRS report to CSV format

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With