I am trying to learn Spark. I have a org.apache.spark.sql.Column which I am reading in as a DataFrame. And then I am trying to filter it using a condition on a column:
val resultDataFrame = dataFrame.filter(col("DATECOL") >= date)
The DATECOL is being read as DataTypes.DateTypein to the DataFrame. date is a variable that I have to hardcode.
What I am trying to figure out is how can I define date i.e. how can I create an instance of DataTypes.DateType or convert to it from a String or so, so that I can run the above expression. I tried using a String and it does not give an error, but it returns with no results where it should.
You can make it a java.sql.Date:
val df = Seq(("2016-10-10", 2), ("2017-02-02", 10)).toDF("DATECOL", "value")
val df1 = df.withColumn("DATECOL", to_date($"DATECOL"))
// df1: org.apache.spark.sql.DataFrame = [DATECOL: date, value: int]
df1.show
+----------+-----+
| DATECOL|value|
+----------+-----+
|2016-10-10| 2|
|2017-02-02| 10|
+----------+-----+
val date = java.sql.Date.valueOf("2016-11-01")
// date: java.sql.Date = 2016-11-01
df1.filter($"DATECOL" > date).show
+----------+-----+
| DATECOL|value|
+----------+-----+
|2017-02-02| 10|
+----------+-----+
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With