I am using PySpark on Spark 1.5 on Cloudera YARN, using Python 3.3 on Centos 6 Machines. The SQL Server instance is SQL Server Enterprise 64bit. The SQL Server driver is listed below; sqljdbc4.jar; and I have added to my .bashrc
export SPARK_CLASSPATH="/var/lib/spark/sqljdbc4.jar"
export PYSPARK_SUBMIT_ARGS="--conf spark.executor.extraClassPath="/var/lib/spark/sqljdbc4.jar" --driver-class-path="/var/lib/spark/sqljdbc4.jar" --jars="/var/lib/spark/sqljdbc4.jar" --master yarn --deploy-mode client"
And I can see confirmation when I launch Spark that
SPARK_CLASSPATH was detected (set to '/var/lib/spark/sqljdbc4.jar')
I have a dataframe that looks like this schema
root
|-- daytetime: timestamp (nullable = true)
|-- ip: string (nullable = true)
|-- tech: string (nullable = true)
|-- th: string (nullable = true)
|-- car: string (nullable = true)
|-- min_dayte: timestamp (nullable = true)
|-- max_dayte: timestamp (nullable = true)
I have created an empty table already in my MS SQL server called 'dbo.shaping', where the 3 timestamp columns will be datetime2(7) and the others nvarchar(50).
I try to export the dataframe from PySpark using this
properties = {"user": "<username>", "password": "<password>"}
df.write.format('jdbc').options(url='<IP>:1433/<dbname>', dbtable='dbo.shaping',driver="com.microsoft.sqlserver.jdbc.SQLServerDriver",properties=properties)
I get the following traceback error
Py4JError: An error occurred while calling o250.option. Trace:
py4j.Py4JException: Method option([class java.lang.String, class java.util.HashMap]) does not exist
at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:333)
at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:342)
at py4j.Gateway.invoke(Gateway.java:252)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:207)
at java.lang.Thread.run(Thread.java:744)
Are my methods at least correct, and perhaps this error is related to writing the specific type of data, ie, I have an issue with the data construct and not my code?
PySpark is very well used in Data Science and Machine Learning community as there are many widely used data science libraries written in Python including NumPy, TensorFlow.
PySpark is the Python API for Apache Spark, an open source, distributed computing framework and set of libraries for real-time, large-scale data processing. If you're already familiar with Python and libraries such as Pandas, then PySpark is a good language to learn to create more scalable analyses and pipelines.
We have mentioned here both the advantages and disadvantages of Python and PySpark. Both of them are amazing. But, you should go for PySpark due to its fault-tolerant nature.
PySpark is a great language for data scientists to learn because it enables scalable analysis and ML pipelines. If you're already familiar with Python and SQL and Pandas, then PySpark is a great way to start.
You cannot use a dict
as a value for options
. options
method expects only str
arguments (Scala docs and PySpark annotations) and is expanded to separate calls to Java option
.
In current Spark versions value is automatically converted to string, so your code would fail silently, but it isn't the case in 1.5.
Since properties
are specific to JDBC driver anyway, you should use jdbc
method:
properties = {
"user": "<username>", "password": "<password>", "driver":
"com.microsoft.sqlserver.jdbc.SQLServerDriver"}
df.write.jdbc(
url='<IP>:1433/<dbname>',
table='dbo.shaping',
properties=properties)
though unpacking properties should work as well:
.options(
url='<IP>:1433/<dbname>',
dbtable='dbo.shaping',
driver="com.microsoft.sqlserver.jdbc.SQLServerDriver",
**properties)
In general, when you see:
py4j.Py4JException: Method ... does not exist
it usually signalizes mismatch between local Python types, and the types expected by JVM method in use.
See also: How to use JDBC source to write and read data in (Py)Spark?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With