I am writing this not for asking the question, but sharing the knowledge. I was using Spark to connect to snowflake. But I could not access snowflake. It seemed like there was something wrong with internal JDBC driver in databricks.
Here was the error I got.
java.lang.NoClassDefFoundError:net/snowflake/client/jdbc/internal/snowflake/common/core/S3FileEncryptionMaterial
I tried many versions of snowflake jdbc drivers and snowflake drivers. It seemed like I could match the correct one.
Answer as given by the asker (I just extracted it from the question for better site usability:
Step 1: Create cluster with Spark version - 2.3.0. and Scala Version - 2.11
Step 2: Attached snowflake-jdbc-3.5.4.jar to the cluster.
https://mvnrepository.com/artifact/net.snowflake/snowflake-jdbc/3.5.4
Step 3: Attached spark-snowflake_2.11-2.3.2 driver to the cluster.
https://mvnrepository.com/artifact/net.snowflake/spark-snowflake_2.11/2.3.2
Here is the sample code.
val SNOWFLAKE_SOURCE_NAME = "net.snowflake.spark.snowflake"
val sfOptions = Map(
"sfURL" -> "<snowflake_url>",
"sfAccount" -> "<your account name>",
"sfUser" -> "<your account user>",
"sfPassword" -> "<your account pwd>",
"sfDatabase" -> "<your database name>",
"sfSchema" -> "<your schema name>",
"sfWarehouse" -> "<your warehouse name>",
"sfRole" -> "<your account role>",
"region_id"-> "<your region name, if you are out of us region>"
)
val df: DataFrame = sqlContext.read
.format(SNOWFLAKE_SOURCE_NAME)
.options(sfOptions)
.option("dbtable", "<your table>")
.load()
If you are using Databricks, there is a Databricks Snowflake connector created jointly by Databricks and Snowflake people. You just have to provide a few items to create a Spark dataframe (see below -- copied from the Databricks document).
# snowflake connection options
options = dict(sfUrl="<URL for your Snowflake account>",
sfUser=user,
sfPassword=password,
sfDatabase="<The database to use for the session after connecting>",
sfSchema="<The schema to use for the session after connecting>",
sfWarehouse="<The default virtual warehouse to use for the session after connecting>")
df = spark.read \
.format("snowflake") \
.options(**options) \
.option("dbtable", "<The name of the table to be read>") \
.load()
display(df)
As long as you are accessing your own databases with all the access rights granted correctly, this only take a few minutes, even during our first attempt.
Good luck!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With