Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to add days (as values of a column) to date?

I have a problem with adding days (numbers) to date format columns in Spark. I know that there is a function date_add that takes two arguments - date column and integer:

date_add(date startdate, tinyint/smallint/int days)

I'd like to use a column value that is of type integer instead (not an integer itself).

Say I have the following dataframe:

val data = Seq(
    (0, "2016-01-1"),
    (1, "2016-02-2"),
    (2, "2016-03-22"),
    (3, "2016-04-25"),
    (4, "2016-05-21"),
    (5, "2016-06-1"),
    (6, "2016-03-21"))
).toDF("id", "date")

I can simply add integers to dates:

val date_add_fun = 
data.select(
    $"id",
    $"date",
    date_add($"date", 1)
)

But I cannot use a column expression that contains the values:

val date_add_fun = 
data.select(
    $"id",
    $"date",
    date_add($"date", $"id")
)

It gives error:

<console>:60: error: type mismatch;
 found   : org.apache.spark.sql.ColumnName
 required: Int
           date_add($"date", $"id")

Does anyone know if it is possible to use column is date_add function? Or what is the workaround?

like image 965
Mrgr8m4 Avatar asked Apr 12 '18 15:04

Mrgr8m4


2 Answers

You can use expr:

import org.apache.spark.sql.functions.expr

data.withColumn("future", expr("date_add(date, id)")).show
// +---+----------+----------+
// | id|      date|    future|
// +---+----------+----------+
// |  0| 2016-01-1|2016-01-01|
// |  1| 2016-02-2|2016-02-03|
// |  2|2016-03-22|2016-03-24|
// |  3|2016-04-25|2016-04-28|
// |  4|2016-05-21|2016-05-25|
// |  5| 2016-06-1|2016-06-06|
// |  6|2016-03-21|2016-03-27|
// +---+----------+----------+

selectExpr could be use in a similar way:

data.selectExpr("*", "date_add(date, id) as future").show
like image 104
Alper t. Turker Avatar answered Oct 19 '22 04:10

Alper t. Turker


The other answers work but aren't a drop in replacement for the existing date_add function. I had a case where expr wouldn't work for me, so here is a drop in replacement:

def date_add(date: Column, days: Column) = {
    new Column(DateAdd(date.expr, days.expr))
}

Basically, all the machinery is there in Spark to do this already, the function signature for date_add just forces it to be a literal.

like image 43
Dave DeCaprio Avatar answered Oct 19 '22 03:10

Dave DeCaprio