Write spark dataframe to postgres Database

Tags:

The spark cluster setting is as follows:

conf['SparkConfiguration'] = SparkConf() \
.setMaster('yarn-client') \
.setAppName("test") \
.set("spark.executor.memory", "20g") \
.set("spark.driver.maxResultSize", "20g") \
.set("spark.executor.instances", "20")\
.set("spark.executor.cores", "3") \
.set("spark.memory.fraction", "0.2") \
.set("user", "test_user") \
.set("spark.executor.extraClassPath", "/usr/share/java/postgresql-jdbc3.jar")

When I try to write the dataframe to the Postgres DB using the following code:

from pyspark.sql import DataFrameWriter
my_writer = DataFrameWriter(df)

url_connect = "jdbc:postgresql://198.123.43.24:1234"
table = "test_result"
mode = "overwrite"
properties = {"user":"postgres", "password":"password"}

my_writer.jdbc(url_connect, table, mode, properties)

I encounter the below error:

Py4JJavaError: An error occurred while calling o1120.jdbc.   
:java.sql.SQLException: No suitable driver
    at java.sql.DriverManager.getDriver(DriverManager.java:278)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$2.apply(JdbcUtils.scala:50)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$2.apply(JdbcUtils.scala:50)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.createConnectionFactory(JdbcUtils.scala:49)
at org.apache.spark.sql.DataFrameWriter.jdbc(DataFrameWriter.scala:278)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)
at py4j.Gateway.invoke(Gateway.java:259)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:209)
at java.lang.Thread.run(Thread.java:745)

Can anyone provide some suggestions on this? Thank you!

827

asked Aug 08 '16 09:08

Yiliang

2 Answers

Try write.jdbc and pass the parameters individually created outside the write.jdbc(). Also check the port on which postgres is available for writing mine is 5432 for Postgres 9.6 and 5433 for Postgres 8.4.

mode = "overwrite"
url = "jdbc:postgresql://198.123.43.24:5432/kockpit"
properties = {"user": "postgres","password": "password","driver": "org.postgresql.Driver"}
data.write.jdbc(url=url, table="test_result", mode=mode, properties=properties)

131

answered Oct 11 '22 20:10

Abhishek Pansotra

Have you downloaded the PostgreSQL JDBC Driver? Download it here: https://jdbc.postgresql.org/download.html.

For the pyspark shell you use the SPARK_CLASSPATH environment variable:

$ export SPARK_CLASSPATH=/path/to/downloaded/jar
$ pyspark

For submitting a script via spark-submit use the --driver-class-path flag:

$ spark-submit --driver-class-path /path/to/downloaded/jar script.py

answered Oct 11 '22 20:10

Mary

Related questions
                            
                                More efficient way of doing destroy_all in Rails?
                            
                                How to read WAL files in pg_xlog directory through java
                            
                                SQL constraint to prevent updating a column based on its prior value
                            
                                Iterating through PostgreSQL records. How to reference data from next row?
                            
                                PostgreSQL: how to return composite type
                            
                                Get average value from the "last" N rows in a table
                            
                                Postgresql and pgAdmin III, what is character varying[]
                            
                                Partitioned table query still scanning all partitions
                            
                                Rails schema.rb does not include new custom Postgres function
                            
                                PostgreSQL SELECT only alpha characters on a row
                            
                                Extract an int, string, boolean, etc. as its corresponding PostgreSQL type from JSON
                            
                                Copy value from other table on insert trigger, postgres
                            
                                How to use triggers to prevent duplicate records in PostgreSQL?
                            
                                What happens with duplicates when inserting multiple rows?
                            
                                How do I replace a table in Postgres?
                            
                                INSERT multiple rows from SELECT in Postgresql
                            
                                PostgreSQL multidimensional arrays
                            
                                Postgres Inner join insert into and select from syntax
                            
                                PostgreSQL getting daily, weekly, and monthly averages of the occurrences of an event in one query
                            
                                PostgreSQL Check Constraint in Liquibase

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Write spark dataframe to postgres Database

Tags:

dataframe

postgresql

apache-spark

apache-spark-sql

Yiliang

People also ask

2 Answers

Abhishek Pansotra

Mary

Recent Activity

Donate For Us