I've been working on a project and came across some interesting behavior when using SELECT INTO. If I have a table with a column defined as int identity(1,1) not null
and use SELECT INTO to copy it, the new table will retain the IDENTITY property unless there is a join involved. If there is a join, then the same column on the new table is defined simply as int not null
.
Here is a script that you can run to reproduce the behavior:
CREATE TABLE People (Id INT IDENTITY(1,1) not null, Name VARCHAR(10))
CREATE TABLE ReverseNames (Name varchar(10), ReverseName varchar(10))
INSERT INTO People (Name)
VALUES ('John'), ('Jamie'), ('Joe'), ('Jenna')
INSERT INTO ReverseNames (Name, ReverseName)
VALUES ('John','nhoJ'), ('Jamie','eimaJ'), ('Joe','eoJ'), ('Jenna','anneJ')
--------
SELECT Id, Name
INTO People_ExactCopy
FROM People
SELECT Id, ReverseName as Name
INTO People_WithJoin
FROM People
JOIN ReverseNames
ON People.Name = ReverseNames.Name
SELECT Id, (SELECT ReverseName FROM ReverseNames WHERE Name = People.Name) as Name
INTO People_WithSubSelect
FROM People
--------
SELECT OBJECT_NAME(c.object_id) as [Table],
c.is_identity as [Id Column Retained Identity]
FROM sys.columns c
where
OBJECT_NAME(c.object_id) IN ('People_ExactCopy','People_WithJoin','People_WithSubSelect')
AND c.name = 'Id'
--------
DROP TABLE People
DROP TABLE People_ExactCopy
DROP TABLE People_WithJoin
DROP TABLE People_WithSubSelect
DROP TABLE ReverseNames
I noticed that the execution plans for both the WithJoin and WithSubSelect queries contained one join operator. I'm not sure if one will be significantly better on performance if we were dealing with a larger set of rows.
Can anyone shed any light on this and tell me if there is a way to utilize SELECT INTO with joins and still preserve the IDENTITY property?
From Microsoft:
When an existing identity column is selected into a new table, the new column inherits the IDENTITY property, unless one of the following conditions is true:
The SELECT statement contains a join, GROUP BY clause, or aggregate function. Multiple SELECT statements are joined by using UNION. The identity column is listed more than one time in the select list. The identity column is part of an expression. The identity column is from a remote data source.
If any one of these conditions is true, the column is created NOT NULL instead of inheriting the IDENTITY property. If an identity column is required in the new table but such a column is not available, or you want a seed or increment value that is different than the source identity column, define the column in the select list using the IDENTITY function.
You could use the IDENTITY
function as they suggest and omit the IDENTITY
column, but then you would lose the values, as the IDENTITY
function would generate new values and I don't think that those are easily determinable, even with ORDER BY
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With