Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using pyspark to connect to PostgreSQL

Tags:

I am trying to connect to a database with pyspark and I am using the following code:

sqlctx = SQLContext(sc) df = sqlctx.load(     url = "jdbc:postgresql://[hostname]/[database]",     dbtable = "(SELECT * FROM talent LIMIT 1000) as blah",     password = "MichaelJordan",     user =  "ScottyPippen",     source = "jdbc",     driver = "org.postgresql.Driver" ) 

and I am getting the following error:

enter image description here

Any idea why is this happening?

Edit: I am trying to run the code locally in my computer.

like image 840
Mpizos Dimitris Avatar asked Jan 22 '16 13:01

Mpizos Dimitris


People also ask

Can Spark read from postgres?

Load data from PostgreSQL in SparkNow we can use the same package to load data from PostgreSQL database in Spark. The data load part will run in Spark driver application.

How does Pyspark connect to database?

To connect any database connection we require basically the common properties such as database driver , db url , username and password. Hence in order to connect using pyspark code also requires the same set of properties. url — the JDBC url to connect the database.


1 Answers

Download the PostgreSQL JDBC Driver from https://jdbc.postgresql.org/download.html

Then replace the database configuration values by yours.

from pyspark.sql import SparkSession  spark = SparkSession \     .builder \     .appName("Python Spark SQL basic example") \     .config("spark.jars", "/path_to_postgresDriver/postgresql-42.2.5.jar") \     .getOrCreate()  df = spark.read \     .format("jdbc") \     .option("url", "jdbc:postgresql://localhost:5432/databasename") \     .option("dbtable", "tablename") \     .option("user", "username") \     .option("password", "password") \     .option("driver", "org.postgresql.Driver") \     .load()  df.printSchema() 

More info: https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html

like image 153
Rafael Avatar answered Sep 28 '22 11:09

Rafael