I use Spark 2.1 and python 2.7.12.
Suppose the following:
from pyspark.sql.functions import *
import timestamp
data = [Row(time=datetime.datetime(2017, 1, 1, 0, 0, 0, 0)), Row (time=datetime.datetime(1980, 1, 1, 0, 0, 0, 0)), Row(time=None) ]
df = spark.createDataFrame(data)
How to use df.fillna({'time': datetime.datetime(1980, 1, 1, 0, 0, 0, 0)})
to fill in the null
value/s with a specific time?
You can try with coalesce
:
from pyspark.sql.functions import *
default_time = datetime.datetime(1980, 1, 1, 0, 0, 0, 0)
result = df.withColumn('time', coalesce(col('time'), lit(default_time)))
Or, if you want to keep with fillna
, you need to pass the deafult value as a string, in the standard format:
from pyspark.sql.functions import *
default_time = '1980-01-01 00:00:00'
result = df.fillna({'time': default_time})
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With