So I have a simple table called temp that can be created by:
CREATE TABLE temp (value int, id int not null primary key);
INSERT INTO temp
VALUES(0,1),
(0,2),
(0,3),
(0,4),
(1,5),
(1,6),
(1,7),
(1,8);
I have a second table temp2 that can be created by:
CREATE TABLE temp (value int, id int);
INSERT INTO temp
VALUES(0,1),
(0,2),
(0,3),
(0,4),
(1,5),
(1,6),
(1,7),
(1,8);
The only difference between temp and temp2 is that the id field is the primary key in temp, and temp2 has no primary key. I'm not sure how, but I am getting differing results with the following query:
select * from temp
where id in (
select id
from (
select id, ROW_NUMBER() over (partition by value order by value) rownum
from temp
) s1
where rownum = 1
)
This is the result for temp:
value id
----------- -----------
0 1
0 2
0 3
0 4
1 5
1 6
1 7
1 8
and this is what I get when temp is replaced by temp2 (THE CORRECT RESULT):
value id
----------- -----------
0 1
1 5
When running the inner-most query (s1), the expected results are retrieved:
id rownum
----------- --------------------
1 1
2 2
3 3
4 4
5 1
6 2
7 3
8 4
When just running the in
statement query on both, I also get the expected result:
id
-----------
1
5
I cannot figure out what the reason for this could possibly be. Is this a bug?
Notes: temp2 was created with a simple select * into temp2 from temp
. I am running SQL Server 2008. My apologies if this is a known glitch. It is difficult to search for this since it requires an in
statement. An "equivalent" query that uses a join does produce the correct results on both tables.
Edit: dbfiddle showing the differences: Unexpected Results Expected Results
A Primary Key violation occur when there is an attempt to insert a value into a database table, but the entry already exists.
Each table can only have one primary key. Access can automatically create a primary key field for you when you create a table, or you can specify the fields that you want to use as the primary key.
Every table should have some column (or set of columns) that uniquely identifies one and only one row. It makes it much easier to maintain the data. It's true, without a primary key (or some unique key), you don't have an insertion anomaly if you go to insert the same data multiple times.
The primary key's main function is to uniquely identify each row in the table - but it doesn't imply any (physical) sorting per se.
I can't specifically answer your question, but changing the ORDER BY
fixes the problem. partition by value order by value
doesn't really make sense, and it looks like the problem is "fooling" SQL Server; as you're partitioning the rows by the same value you're ordering by, every row is "row number 1" as they could all be at the start. Don't forget, a table is an unordered heap; even when it has a Primary Key (clustered or not).
If you change your ORDER BY
to id
instead the problem goes away.
SELECT *
FROM temp2 t2
WHERE t2.id IN (SELECT s1.id
FROM (SELECT sq.id,
ROW_NUMBER() OVER (PARTITION BY sq.value ORDER BY sq.id) AS rownum
FROM temp2 sq) s1
WHERE s1.rownum = 1);
In fact, changing the ORDER BY
clause to anything else fixes the problem:
SELECT *
FROM temp2 t2
WHERE t2.id IN (SELECT s1.id
FROM (SELECT sq.id,
ROW_NUMBER() OVER (PARTITION BY sq.value ORDER BY (SELECT NULL)) AS rownum
FROM temp2 sq) s1
WHERE s1.rownum = 1);
So the problem is that your are using the same expression (column) for both your PARTITION BY
and ORDER BY
clause; meaning that any of those rows could be row number 1, and none of them; thus all are returned. It doesn't make sense for both to be the same, so they should be different.
Still, this problem does persist in SQL Server 2017 (and I suspect 2019) so you might want to raise a support ticket with them anyway (but as you're using 2008 don't expect it to get fixed, as your support is about to end).
As comments can be deleted without notice I wanted to add @scsimon's comment and my response:
scsimon: Interesting. Changing
rownum = 2
gives expected results without changingorder by
. I think it's a bug.Larnu: I agree at @scsimon. I suspect that changing the
WHERE
tos1.rownum = 2
effectively forces the data engine to actually determine the values ofrownum
, rather than assume every row is "equal"; as if that were the case none would be returned.
Even so, changing theWHERE
tos1.rownum = 2
is still resigning to "return a random row", if thePARTITION BY
andORDER BY
clauses are the same
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With