Rounding hours of datetime in PySpark

Question

I'm trying to round hours using pyspark and udf.

The function works properly in python but not well when using pyspark.

The input is :

date = Timestamp('2016-11-18 01:45:55') # type is pandas._libs.tslibs.timestamps.Timestamp

def time_feature_creation_spark(date):
    return date.round("H").hour

time_feature_creation_udf = udf(lambda x : time_feature_creation_spark(x), IntegerType())

enter image description here

Then I use it in the function that feeds spark :

data = data.withColumn("hour", time_feature_creation_udf(data["date"])

And the error is :

TypeError: 'Column' object is not callable

The expected output is just the closest hour from the time in the datetime (e.g. 20h45 is closest to 21h, so returns 21)

LN_P · Accepted Answer

A nicer version than /3600*3600 is using the built-in function date_trunc

import pyspark.sql.functions as F
return df.withColumn("hourly_timestamp", F.date_trunc("hour", df.timestamp))

other formats besides hour are

year’, ‘yyyy’, ‘yy’, ‘month’, ‘mon’, ‘mm’, ‘day’, ‘dd’, ‘hour’, ‘minute’, ‘second’, ‘week’, ‘quarter’

Rounding hours of datetime in PySpark

Tags:

python

apache-spark

pyspark

user-defined-functions

LaSul

1 Answers

LN_P

Recent Activity

Donate For Us

Rounding hours of datetime in PySpark

Tags:

python

apache-spark

pyspark

user-defined-functions

LaSul

1 Answers

LN_P

Related questions

Recent Activity

Donate For Us