Using pyspark:
from pyspark.sql import SparkSession
spark = SparkSession\
.builder\
.appName("spark play")\
.getOrCreate()
df = spark.read\
.format("jdbc")\
.option("url", "jdbc:mysql://localhost:port")\
.option("dbtable", "schema.tablename")\
.option("user", "username")\
.option("password", "password")\
.load()
Rather than fetch "schema.tablename", I would prefer to grab the result set of a query.
Spark SQL also includes a data source that can read data from other databases using JDBC.
Spark SQL lets you query structured data inside Spark programs, using either SQL or a familiar DataFrame API. Usable in Java, Scala, Python and R. Apply functions to results of SQL queries.
Spark provides api to support or to perform database read and write to spark dataframe from external db sources. And it requires the driver class and jar to be placed correctly and also to have all the connection properties specified in order to load or unload the data from external data sources.
Spark SQL can cache tables using an in-memory columnar format by calling spark. catalog. cacheTable("tableName") or dataFrame. cache() .
Same as in 1.x you can pass valid subquery as dbtable
argument for example:
...
.option("dbtable", "(SELECT foo, bar FROM schema.tablename) AS tmp")
...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With