If I have a table with an auto-incrementing ID column, I'd like to be able to insert a row into that table, and get the ID of the row I just created. I know that generally, StackOverflow questions need some sort of code that was attempted or research effort, but I'm not sure where to begin with Snowflake. I've dug through their documentation and I've found nothing for this. The best I could do so far is try <code>result_scan()</code> and <code>last_query_id()</code>, but these don't give me any relevant information about the row that was inserted, just confirmation that a row was inserted. I believe what I'm asking for is along the lines of MS SQL Server's <code>SCOPE_IDENTITY()</code> function. Is there a Snowflake equivalent function for MS SQL Server's <code>SCOPE_IDENTITY()</code>? EDIT: for the sake of having code in here: <pre class="prettyprint"><code>CREATE TABLE my_db..my_table ( ROWID INT IDENTITY(1,1), some_number INT, a_time TIMESTAMP_LTZ(9), b_time TIMESTAMP_LTZ(9), more_data VARCHAR(10) ); INSERT INTO my_db..my_table ( some_number, a_time, more_data ) VALUES (1, my_time_value, some_data); </code></pre> I want to get to that auto-increment <code>ROWID</code> for this row I just inserted.

NOTE: The answer below can be not 100% correct in some very rare cases, see the UPDATE section below <h3>Original answer</h3> Snowflake does not provide the equivalent of <code>SCOPE_IDENTITY</code> today. However, you can exploit Snowflake's time travel to retrieve the maximum value of a column right after a given statement is executed. Here's an example: <pre class="prettyprint"><code>create or replace table x(rid int identity, num int); insert into x(num) values(7); insert into x(num) values(9); -- you can insert rows in a separate transaction now to test it select max(rid) from x AT(statement=>last_query_id()); ----------+ MAX(RID) | ----------+ 2 | ----------+ </code></pre> You can also save the <code>last_query_id()</code> into a variable if you want to access it later, e.g. <pre class="prettyprint"><code>insert into x(num) values(5); set qid = last_query_id(); ... select max(rid) from x AT(statement=>$qid); </code></pre> Note - it will be usually correct, but if the user e.g. inserts a large value into <code>rid</code> manually, it might influence the result of this query. <h3>UPDATE</h3> Note, I realized the code above might rarely generate incorrect answer. Since the execution order of various phases of a query in a distributed system like <code>Snowflake</code> can be non-deterministic, and Snowflake allows concurrent INSERT statements, the following might happen <ul> <li>Two queries, <code>Q1</code> and <code>Q2</code>, do a simple single row <code>INSERT</code>, start at roughly the same time</li> <li> <code>Q1</code> starts, is a bit ahead</li> <li> <code>Q2</code> starts</li> <li> <code>Q1</code> creates a row with value <code>1</code> from the <code>IDENTITY</code> column</li> <li> <code>Q2</code> creates a row with value <code>2</code> from the <code>IDENTITY</code> column</li> <li> <code>Q2</code> gets ahead of <code>Q1</code> - this is the key part </li> <li> <code>Q2</code> commits, is marked as finished at time <code>T2</code> </li> <li> <code>Q1</code> commits, is marked as finished at time <code>T1</code> </li> </ul> Note that <code>T1</code> is later than <code>T2</code>. Now, when we try to do <code>SELECT ... AT(statement=>Q1)</code>, we will see the state as-of <code>T1</code>, including all changes from statements before, hence including the value <code>2</code> from <code>Q2</code>. Which is not what we want. The way around it could be to add a <code>unique identifier</code> to each <code>INSERT</code> (e.g. from a separate SEQUENCE object), and then use a <code>MAX</code>. Sorry. Distributed transactions are hard :)

Get identity of row inserted in Snowflake Datawarehouse

Tags:

sql

snowflake-cloud-data-platform

If I have a table with an auto-incrementing ID column, I'd like to be able to insert a row into that table, and get the ID of the row I just created. I know that generally, StackOverflow questions need some sort of code that was attempted or research effort, but I'm not sure where to begin with Snowflake. I've dug through their documentation and I've found nothing for this.

The best I could do so far is try result_scan() and last_query_id(), but these don't give me any relevant information about the row that was inserted, just confirmation that a row was inserted.

I believe what I'm asking for is along the lines of MS SQL Server's SCOPE_IDENTITY() function.

Is there a Snowflake equivalent function for MS SQL Server's SCOPE_IDENTITY()?

EDIT: for the sake of having code in here:

CREATE TABLE my_db..my_table
(
    ROWID INT IDENTITY(1,1),
    some_number INT,
    a_time TIMESTAMP_LTZ(9),
    b_time TIMESTAMP_LTZ(9),
    more_data VARCHAR(10)
);
INSERT INTO my_db..my_table
(
    some_number,
    a_time,
    more_data
)
VALUES
(1, my_time_value, some_data);

I want to get to that auto-increment ROWID for this row I just inserted.

201

asked Dec 18 '18 17:12

Joshua Schlichting

1 Answers

NOTE: The answer below can be not 100% correct in some very rare cases, see the UPDATE section below

Original answer

Snowflake does not provide the equivalent of SCOPE_IDENTITY today.

However, you can exploit Snowflake's time travel to retrieve the maximum value of a column right after a given statement is executed.

Here's an example:

create or replace table x(rid int identity, num int);
insert into x(num) values(7);
insert into x(num) values(9);
-- you can insert rows in a separate transaction now to test it
select max(rid) from x AT(statement=>last_query_id());
----------+
 MAX(RID) |
----------+
 2        |
----------+

You can also save the last_query_id() into a variable if you want to access it later, e.g.

insert into x(num) values(5);
set qid = last_query_id();
...
select max(rid) from x AT(statement=>$qid);

Note - it will be usually correct, but if the user e.g. inserts a large value into rid manually, it might influence the result of this query.

UPDATE

Note, I realized the code above might rarely generate incorrect answer.

Since the execution order of various phases of a query in a distributed system like Snowflake can be non-deterministic, and Snowflake allows concurrent INSERT statements, the following might happen

Two queries, Q1 and Q2, do a simple single row INSERT, start at roughly the same time
Q1 starts, is a bit ahead
Q2 starts
Q1 creates a row with value 1 from the IDENTITY column
Q2 creates a row with value 2 from the IDENTITY column
Q2 gets ahead of Q1 - this is the key part
Q2 commits, is marked as finished at time T2
Q1 commits, is marked as finished at time T1

Note that T1 is later than T2. Now, when we try to do SELECT ... AT(statement=>Q1), we will see the state as-of T1, including all changes from statements before, hence including the value 2 from Q2. Which is not what we want.

The way around it could be to add a unique identifier to each INSERT (e.g. from a separate SEQUENCE object), and then use a MAX.

Sorry. Distributed transactions are hard :)

answered Nov 16 '22 12:11

Marcin Zukowski

Related questions
                            
                                Why are the result of COUNT double when I do two join? [duplicate]
                            
                                dropdb mydb not working in postgres
                            
                                ODBC/DBI in R will not write to a table with a non-default schema in R
                            
                                pivot rows to 14 columns as 7 tuples
                            
                                What would be the difference between WITH clause & temporary table?
                            
                                What is IsNull in HQL?
                            
                                How many lines are executed after IF?
                            
                                Show correct result with SQL Joins
                            
                                Generate ID based on multiple columns
                            
                                How to insert a row into another table using last inserted ID?
                            
                                How to import a SQLite3 database into Python Jupyter Notebook?
                            
                                Cannot drop a role that is granted to connect database
                            
                                SQL Insert multiple record while using ON DUPLICATE KEY UPDATE
                            
                                SQL SUM on multiple INNER JOIN
                            
                                Newbie question: Problem with results, sql, join, where, "<" operator
                            
                                How to parse XML data in SQL server table
                            
                                How to have auto increment in ClickHouse?
                            
                                Comparing two columns in postgres database
                            
                                Update table using JSON in SQL
                            
                                SQL Server select variable where no results

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With