java.util.HashMap missing in PySpark session

Question

I'm working with Apache Spark 1.4.0 on Windows 7 x64 with Java 1.8.0_45 x64 and Python 2.7.10 x86 in IPython 3.2.0

I am attempting to write a DataFrame-based program in an IPython notebook that reads from and writes back to an SQL Server database.

So far I can read data from the database

from pyspark.sql import SQLContext
sqlContext = SQLContext(sc)
df = sqlContext.load(source="jdbc",url="jdbc:sqlserver://serverURL", dbtable="dbName.tableName", driver="com.microsoft.sqlserver.jdbc.SQLServerDriver", user="userName", password="password")

and convert the data to a Panda and do whatever I want to it. (This was more than a little hassle, but it works after adding Microsoft's sqljdbc42.jar to spark.driver.extraClassPath in spark-defaults.conf)

The current problem arises when I go to write the data back to SQL Server with the DataFrameWriter API:

df.write.jdbc("jdbc:sqlserver://serverURL", "dbName.SparkTestTable1", dict(driver="com.microsoft.sqlserver.jdbc.SQLServerDriver", user="userName", password="password"))

---------------------------------------------------------------------------
Py4JError                                 Traceback (most recent call last)
<ipython-input-19-8502a3e85b1e> in <module>()
----> 1 df.write.jdbc("jdbc:sqlserver://jdbc:sqlserver", "dbName.SparkTestTable1", dict(driver="com.microsoft.sqlserver.jdbc.SQLServerDriver", user="userName", password="password"))

C:\Users\User\Downloads\spark-1.4.0-bin-hadoop2.6\python\pyspark\sql
eadwriter.pyc in jdbc(self, url, table, mode, properties)
    394         for k in properties:
    395             jprop.setProperty(k, properties[k])
--> 396         self._jwrite.mode(mode).jdbc(url, table, jprop)
    397 
    398 

C:\Python27\lib\site-packages\py4j\java_gateway.pyc in __call__(self, *args)
    536         answer = self.gateway_client.send_command(command)
    537         return_value = get_return_value(answer, self.gateway_client,
--> 538                 self.target_id, self.name)
    539 
    540         for temp_arg in temp_args:

C:\Python27\lib\site-packages\py4j\protocol.pyc in get_return_value(answer, gateway_client, target_id, name)
    302                 raise Py4JError(
    303                     'An error occurred while calling {0}{1}{2}. Trace:
{3}
'.
--> 304                     format(target_id, '.', name, value))
    305         else:
    306             raise Py4JError(

Py4JError: An error occurred while calling o49.mode. Trace:
py4j.Py4JException: Method mode([class java.util.HashMap]) does not exist
    at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:333)
    at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:342)
    at py4j.Gateway.invoke(Gateway.java:252)
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
    at py4j.commands.CallCommand.execute(CallCommand.java:79)
    at py4j.GatewayConnection.run(GatewayConnection.java:207)
    at java.lang.Thread.run(Unknown Source)

The problem seems to be that py4j cannot find the Java java.util.HashMap class when it goes to convert my connectionProperties dictionary into a JVM object. Adding my rt.jar (with path) to spark.driver.extraClassPath does not not resolve the issue. Removing the dictionary from the write command avoids this error, but of course the write fails to due a lack of driver and authentication.

Edit: The o49.mode part of the error changes from run to run.

Techrocket9 · Accepted Answer

Davies Liu on the Spark users mailing list found the problem. There is a subtle difference between the Scala and Python APIs that I missed. You have to pass in a mode string (such as "overwrite") as the 3rd parameter in the Python API but not the Scala API. Changing the statement as follows resolves this issue:

df.write.jdbc("jdbc:sqlserver://serverURL", "dbName.SparkTestTable1", "overwrite", dict(driver="com.microsoft.sqlserver.jdbc.SQLServerDriver", user="userName", password="password"))

java.util.HashMap missing in PySpark session

Tags:

python

apache-spark

pyspark

py4j

Techrocket9

1 Answers

Techrocket9

Recent Activity

Donate For Us

java.util.HashMap missing in PySpark session

Tags:

python

apache-spark

pyspark

py4j

Techrocket9

1 Answers

Techrocket9

Related questions

Recent Activity

Donate For Us