Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Snowflake Using Streams to Track Updates/Deletes to a Table

I am having trouble understanding how Streams work in terms of tracking changes. I would like to create a history table that tracks every UPDATE and DELETE to a table, but I am finding I do not understand how this works.

If I have table Table1 with a Stream:

 CREATE TABLE Table1
 (
   XID INT IDENTITY PRIMARY KEY,
   FIELD1 INT,
   FIELD2 STRING,
   DATECREATED TIMESTAMP DEFAULT CURRENT_TIMESTAMP::TIMESTAMP
 );

 CREATE STREAM Table1_History ON TABLE Table1;

If I insert data:

INSERT INTO Table1 (FIELD1,FIELD2)
VALUES
(101,'String1'),
(102,'String2')
;

Then run:

SELECT * FROM Table1_History;

It returns the following:

XID FIELD1  FIELD2  DATECREATED METADATA$ACTION METADATA$ISUPDATE   METADATA$ROW_ID
1   101 String1 2020-08-13 06:52:34.402 INSERT  FALSE   23bc7a4d83522484f4d7e36edf84b4c7986dfa9b
2   102 String2 2020-08-13 06:52:34.402 INSERT  FALSE   5b5e429cf3a174303b2f2192b5d602ed9dedd865

So far so good.

But if I run:

UPDATE Table1 SET FIELD1 = 1001 WHERE XID = 1;

Then select from Table1_History, I get:

SELECT * FROM Table1_History;

XID FIELD1  FIELD2  DATECREATED METADATA$ACTION METADATA$ISUPDATE   METADATA$ROW_ID
1   1001    String1 2020-08-13 06:52:34.402 INSERT  FALSE   23bc7a4d83522484f4d7e36edf84b4c7986dfa9b
2   102 String2 2020-08-13 06:52:34.402 INSERT  FALSE   5b5e429cf3a174303b2f2192b5d602ed9dedd865

The METADATA$ACTION is still INSERT, and the FIELD1 value is now stored in the stream as 1001. There is no longer any record I can see that the row used to have a value of 101 and that it was updated.

If I run the following:

DELETE FROM Table1 WHERE XID = 2;

The stream now returns:

SELECT * FROM Table1_History;

XID FIELD1  FIELD2  DATECREATED METADATA$ACTION METADATA$ISUPDATE   METADATA$ROW_ID
1   1001    String1 2020-08-13 06:52:34.402 INSERT  FALSE   23bc7a4d83522484f4d7e36edf84b4c7986dfa9b

There is now 0 records I can see on the stream of the second row ever being in the database.

I dont get the point of the Stream table for tracking UPDATES/DELETES. Is this not the use of streams?

I tried following this: Snowflake Streams Made Simple, but I still dont understand.

like image 321
EliSquared Avatar asked Oct 14 '25 19:10

EliSquared


1 Answers

To quote the Snowflake documentation: "A stream stores the current transactional version of a table and is the appropriate source of CDC records in most scenarios."

Have a look at this example in the Snowflake documentation: https://docs.snowflake.com/en/user-guide/streams.html#example-1

My understanding is that a stream will only hold the current version of a record until you advance the offset. So if you insert a record and then update it, before advancing the offset, then it will show a single insert but the fields will hold the latest values.

If you then advance the offset and update or delete the record then those events will show in the stream - though if you updated and then deleted the same record (before advancing the offset) the stream would just show the delete, as that's the last position for that record.

UPDATE 1 It sounds like you are trying to implement audit tracking for every change made to a record in a table - this is not what Streams are designed to do and I don't think you would be able to implement a solution, using Streams, that guaranteed to log every change.

If you read the Streams documentation it states "The stream can provide the set of changes from the current offset to the current transactional time of the source table (i.e. the current version of the table). The stream maintains only the delta of the changes; if multiple DML statements change a row, the stream contains only the latest action taken on that row."

CDC is a terminology specifically related to loading data warehouses and is never meant as a generic term for capturing every change made to a record.

If you want to create a genuine auditing capability in Snowflake then I'm afraid I don't know if that is possible. The time travel feature shows that Snowflake retains all the changes made to a record (within the retention period) but I'm not aware of any way of accessing just these changes; I think you can only access the history of a record at points in time and you have no way of knowing at what times any changes were made

UPDATE 2 Just realised that Snowflake allows Change Tracking on a table without necessarily using Streams. This is probably a better solution if you want to capture all changes to a table, not just the latest version. The functionality is documented here: https://docs.snowflake.com/en/sql-reference/constructs/changes.html

like image 174
NickW Avatar answered Oct 17 '25 10:10

NickW