I have a oracle table which has n number of records, now i want to load the data from that table with a where/filter condition to spark dataframe. I Do not want to load complete data to a dataframe and then apply filter on it. Is there any option in spark.read.format("jdbc")...etc or any other solution?
Check below code. You can write your own query inside query variable. To process or load data parallel you can check for partitionColumn, lowerBound & upperBound columns.
val query = """
(select columnA,columnB from table_name
where <where conditions>) table
"""
val options = Map(
"url" -> "<url>".
"driver" -> "<driver class>".
"user" -> "<user>".
"password" -> "<password>".
"dbtable" -> query,
"partitionColumn" -> "",
"lowerBound" -> "<lower bound values>",
"upperBound" -> "<upper bound values>"
)
val df = spark
.read
.format("jdbc")
.options(options)
.load()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With