Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Defining a DataTypes.DateType

I am trying to learn Spark. I have a org.apache.spark.sql.Column which I am reading in as a DataFrame. And then I am trying to filter it using a condition on a column:

val resultDataFrame = dataFrame.filter(col("DATECOL") >= date)

The DATECOL is being read as DataTypes.DateTypein to the DataFrame. date is a variable that I have to hardcode.

What I am trying to figure out is how can I define date i.e. how can I create an instance of DataTypes.DateType or convert to it from a String or so, so that I can run the above expression. I tried using a String and it does not give an error, but it returns with no results where it should.

like image 282
rgamber Avatar asked Jun 11 '26 16:06

rgamber


1 Answers

You can make it a java.sql.Date:

val df = Seq(("2016-10-10", 2), ("2017-02-02", 10)).toDF("DATECOL", "value")

val df1 = df.withColumn("DATECOL", to_date($"DATECOL"))
// df1: org.apache.spark.sql.DataFrame = [DATECOL: date, value: int]

df1.show
+----------+-----+
|   DATECOL|value|
+----------+-----+
|2016-10-10|    2|
|2017-02-02|   10|
+----------+-----+

val date = java.sql.Date.valueOf("2016-11-01")
// date: java.sql.Date = 2016-11-01

df1.filter($"DATECOL" > date).show
+----------+-----+
|   DATECOL|value|
+----------+-----+
|2017-02-02|   10|
+----------+-----+
like image 128
Psidom Avatar answered Jun 13 '26 06:06

Psidom



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!