I am trying to connect to a database with pyspark and I am using the following code:
sqlctx = SQLContext(sc) df = sqlctx.load( url = "jdbc:postgresql://[hostname]/[database]", dbtable = "(SELECT * FROM talent LIMIT 1000) as blah", password = "MichaelJordan", user = "ScottyPippen", source = "jdbc", driver = "org.postgresql.Driver" )
and I am getting the following error:
Any idea why is this happening?
Edit: I am trying to run the code locally in my computer.
Load data from PostgreSQL in SparkNow we can use the same package to load data from PostgreSQL database in Spark. The data load part will run in Spark driver application.
To connect any database connection we require basically the common properties such as database driver , db url , username and password. Hence in order to connect using pyspark code also requires the same set of properties. url — the JDBC url to connect the database.
Download the PostgreSQL JDBC Driver from https://jdbc.postgresql.org/download.html
Then replace the database configuration values by yours.
from pyspark.sql import SparkSession spark = SparkSession \ .builder \ .appName("Python Spark SQL basic example") \ .config("spark.jars", "/path_to_postgresDriver/postgresql-42.2.5.jar") \ .getOrCreate() df = spark.read \ .format("jdbc") \ .option("url", "jdbc:postgresql://localhost:5432/databasename") \ .option("dbtable", "tablename") \ .option("user", "username") \ .option("password", "password") \ .option("driver", "org.postgresql.Driver") \ .load() df.printSchema()
More info: https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With