Spark-sqlserver connection

Tags:

Can we connect spark with sql-server? If so, how? I am new to spark, I want to connect the server to spark and work directly from sql-server instead of uploading .txt or .csv file. Please help, Thank you.

352

asked Jan 17 '18 07:01

Tia

2 Answers

// Spark 2.x
import org.apache.spark.SparkContext

// Create dataframe on top of SQLServer database table
val sqlContext = new org.apache.spark.sql.SQLContext(sc)

val jdbcDF = sqlContext.read.format("jdbc").option("driver" , "com.microsoft.sqlserver.jdbc.SQLServerDriver") \
           .option("url", "jdbc:sqlserver://XXXXX.com:port;databaseName=xxx") \
           .option("dbtable", "(SELECT * FROM xxxx) tmp") \
           .option("user", "xxx") \
           .option("password", "xxx") \
           .load()

// show sample records from data frame

jdbcDF.show(5)

182

answered Sep 18 '22 22:09

Ajay Kharade

Here are some code snippets. A DataFrame is used to create the table t2 and insert data. The SqlContext is used to load the data from the t2 table into a DataFrame. I added the spark.driver.extraClassPath and spark.executor.extraClassPath to my spark-default.conf file.

//Spark 1.4.1

//Insert data from DataFrame

case class Conf(mykey: String, myvalue: String)

val data = sc.parallelize( Seq(Conf("1", "Delaware"), Conf("2", "Virginia"), Conf("3", "Maryland"), Conf("4", "South Carolina") ))

val df = data.toDF()

val url = "jdbc:sqlserver://wcarroll3:1433;database=mydb;user=ReportUser;password=ReportUser"

val table = "t2"

df.insertIntoJDBC(url, table, true)

//Load from database using SqlContext

val url = "jdbc:sqlserver://wcarroll3:1433;database=mydb;user=ReportUser;password=ReportUser"

val driver = "com.microsoft.sqlserver.jdbc.SQLServerDriver";

val tbl = { sqlContext.load("jdbc", Map( "url" -> url, "driver" -> driver, "dbtable" -> "t2", "partitionColumn" -> "mykey", "lowerBound" -> "0", "upperBound" -> "100", "numPartitions" -> "1" ))}

tbl.show()

Some issue to consider are:

Make sure firewall ports are open for port 1433. If using Microsoft Azure SQL Server DB, tables require a primary key. Some of the methods create the table, but Spark's code is not creating the primary key so the table creation fails.

Other details to take care: https://docs.databricks.com/spark/latest/data-sources/sql-databases.html

source: https://blogs.msdn.microsoft.com/bigdatasupport/2015/10/22/how-to-allow-spark-to-access-microsoft-sql-server/

answered Sep 21 '22 22:09

Anurag Sharma

Related questions
                            
                                using CASE to select column for SET in UPDATE statement IN SQL SERVER
                            
                                SQL query to insert same value 1000 times without loop
                            
                                How to achieve multi-line strings in C#; an alternative to VB's XML Literals?
                            
                                Average time between dates in same field by groups
                            
                                How Do I Read A Rowversion or Timestamp SQL Server Data Type From a SQLDataReader to a C# Variable
                            
                                How to search Special character (%) in SQL Server 2008 [duplicate]
                            
                                Order by not working when insert in temp table
                            
                                Implementing a recursive query in SQL
                            
                                Get everything before a certain character in SQL
                            
                                SQL Server : large DB Query In Chunks
                            
                                How to count occurrences of a computed column in SQL?
                            
                                Using include doesn't change the behavior
                            
                                SQL: Update a field only if a condition is met
                            
                                bak file not visible when trying to restore database
                            
                                Is "SET NOCOUNT ON" a good choice for a placeholder stored procedure body?
                            
                                General error: 20003 Adaptive Server connection timed out [20003] (severity 6)
                            
                                How to deal with subquery returning more than 1 value
                            
                                Python 3.6 pyodbc to SQL How to execute SP
                            
                                Difference between connection.OpenAsync and connection.Open when using Dapper QueryAsync method
                            
                                Copy tables from one database to another in SQL Server, using Python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Spark-sqlserver connection

Tags:

sql-server

data-analysis

apache-spark

Tia

People also ask

2 Answers

Ajay Kharade

Anurag Sharma

Recent Activity

Donate For Us