Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to fill none values with a concrete timestamp in DataFrame?

I use Spark 2.1 and python 2.7.12.

Suppose the following:

from pyspark.sql.functions import *
import timestamp

data = [Row(time=datetime.datetime(2017, 1, 1, 0, 0, 0, 0)), Row (time=datetime.datetime(1980, 1, 1, 0, 0, 0, 0)), Row(time=None) ]

df = spark.createDataFrame(data)

How to use df.fillna({'time': datetime.datetime(1980, 1, 1, 0, 0, 0, 0)}) to fill in the null value/s with a specific time?

like image 870
Leonard Aukea Avatar asked May 16 '17 08:05

Leonard Aukea


1 Answers

You can try with coalesce:

from pyspark.sql.functions import *
default_time = datetime.datetime(1980, 1, 1, 0, 0, 0, 0)
result = df.withColumn('time', coalesce(col('time'), lit(default_time)))

Or, if you want to keep with fillna, you need to pass the deafult value as a string, in the standard format:

from pyspark.sql.functions import *
default_time = '1980-01-01 00:00:00'
result = df.fillna({'time': default_time})
like image 114
Daniel de Paula Avatar answered Oct 02 '22 12:10

Daniel de Paula