How to consume continuous streaming data in Snowflake connector for KAFKA [closed]

2 Answers

To stream data to Snowflake, we utilize Apache NiFi to pull from Kafka, modify/transform, save to S3 bucket, and then kick off a snowpipe (using a NiFi processor) to ingest to Snowflake tables. But once in Snowflake, we utilize dbt to more sql like transformations and enrichments.

187

answered Jan 04 '23 02:01

Jason Zondor

From the documentation:

The Kafka connector completes the following process to subscribe to Kafka topics and create Snowflake objects:

The Kafka connector subscribes to one or more Kafka topics based on the configuration information provided via the Kafka configuration file or command line (or the Confluent Control Center; Confluent only).

The connector creates the following objects for each topic:

One internal stage to temporarily store data files for each topic.

One pipe to ingest the data files for each topic partition.

One table for each topic. If the table specified for each topic does not exist, the connector creates it; otherwise, the connector creates the RECORD_CONTENT and RECORD_METADATA columns in the existing table and verifies that the other columns are nullable (and produces an error if they are not).

Ingestion then proceeds as follows:

One or more applications publish JSON or Avro records to a Kafka cluster. The records are split into one or more topic partitions.

The Kafka connector buffers messages from the Kafka topics. When a threshold (time or memory or number of messages) is reached, the connector writes the messages to a temporary file in the internal stage. The connector triggers Snowpipe to ingest the temporary file. Snowpipe copies a pointer to the data file into a queue.

A Snowflake-provided virtual warehouse loads data from the staged file into the target table (i.e. the table specified in the configuration file for the topic) via the pipe created for the Kafka topic partition.

The connector monitors Snowpipe and deletes each file in the internal stage after confirming that the file data was loaded into the table. If a failure prevented the data from loading, the connector moves the file into the table stage and produces an error message.

The connector repeats steps 2-4.

https://docs.snowflake.com/en/user-guide/kafka-connector-overview.html

answered Jan 04 '23 00:01

Robert Long

Related questions
                            
                                Snowflake : REGEXP replace with uppercase of capture group
                            
                                What are the specifications of a Snowflake server?
                            
                                How to delete Duplicate records in snowflake database table
                            
                                Querying Snowflake via ODBC (using PDO) in PHP returns incorrect data
                            
                                504 gateway timeout error NodeJs
                            
                                Insert VARIANT type from Pandas into Snowflake
                            
                                Error parsing JSON: more than one document in the input (Redshift to Snowflake SQL)
                            
                                How to convert snowflake array to an array in Golang while querying the database
                            
                                How to Combine Arrays in Snowsql GroupBy and Only Keep Distinct Values?
                            
                                Ordinal of a new column in snowflake table
                            
                                Identify if a column is Virtual in Snowflake
                            
                                Merge statement in SnowFlake seems to be writing too many rows. Is there a way to improve this?
                            
                                Error in granting ownership in snowflake tables
                            
                                Checking if a key exists in a Snowflake variant
                            
                                Parsing JSON list in Snowflake - converting redshift sql to snowflake sql
                            
                                Is SQL floating point sum affected by the order-by clause?
                            
                                Python Pandas: Merge Columns of Data Frame with column name into one column
                            
                                Get identity of row inserted in Snowflake Datawarehouse
                            
                                How do I list all of a user's roles in snowflake DB?
                            
                                MWAA Airflow 2.0 in AWS Snowflake connection not showing

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to consume continuous streaming data in Snowflake connector for KAFKA [closed]

Tags:

snowflake-cloud-data-platform

Austin Jackson

People also ask

2 Answers

Jason Zondor

Robert Long

Recent Activity

Donate For Us