Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Unable to compare dates in Spark SQL query

Using PySpark and JDBC driver for MySQL I am not able to query for columns of type date. java.lang.ClassCastException is thrown.

sqlContext = SQLContext(sc)
df = sqlContext.load(source="jdbc", url=url, dbtable="reports")
sqlContext.registerDataFrameAsTable(df, "reports")
df.printSchema()
# root
#  |-- id: integer (nullable = false)
#  |-- day: date (nullable = false)
query = sqlContext.sql("select * from reports where day > '2015-05-01'")
query.collect() # ... most recent failure: ... java.lang.ClassCastException

Changing day column's type to timestamp solves the problem, but I have to keep the original schema.

like image 573
michal.dul Avatar asked Feb 12 '26 01:02

michal.dul


1 Answers

Looking at the relevant unit tests in the Spark source, it looks like you need an explicit cast:

select * from reports where day > cast('2015-05-01' as date)

There's no sign of it in the Spark SQL documentation, but it seems to have been available in Transact-SQL and Hive for some time.

like image 135
Spiro Michaylov Avatar answered Feb 16 '26 11:02

Spiro Michaylov



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!