I am using spark 2.1.0. I am not able to create timestamp column in pyspark I am using below code snippet. Please help
df=df.withColumn('Age',lit(datetime.now()))
I am getting
assertion error:col should be Column
Please help
Syntax – to_timestamp() This function has above two signatures that defined in PySpark SQL Date & Timestamp Functions, the first syntax takes just one argument and the argument should be in Timestamp format ' MM-dd-yyyy HH:mm:ss. SSS ', when the format is not in this format, it returns null.
In order to populate current date and current timestamp in pyspark we will be using current_date() and current_timestamp() function respectively. current_date() function populates current date in a column in pyspark.
Pyspark Time Format In PySpark, time can be stored in four data types: IntegerType (which is typically used for storing unix time), StringType , DateType , and TimeStampType . Usually the input in IntegerType or StringType will be transformed into TimeStampType or DateType .
Since Spark doesn't have any functions to add units to the Timestamp, we use INTERVAL to do our job. Before we apply INTERVAL, first you need to convert timestamp column from string to TimestampType using cast. Here, first, we create a temporary table using createOrReplaceTempView() and then use this on SQL select.
I am not sure for 2.1.0, on 2.2.1 at least you can just:
from pyspark.sql import functions as F
df.withColumn('Age', F.current_timestamp())
Hope it helps!
Assuming you have dataframe from your code snippet and you want same timestamp for all your rows.
Let me create some dummy dataframe.
>>> dict = [{'name': 'Alice', 'age': 1},{'name': 'Again', 'age': 2}]
>>> df = spark.createDataFrame(dict)
>>> import time
>>> import datetime
>>> timestamp = datetime.datetime.fromtimestamp(time.time()).strftime('%Y-%m-%d %H:%M:%S')
>>> type(timestamp)
<class 'str'>
>>> from pyspark.sql.functions import lit,unix_timestamp
>>> timestamp
'2017-08-02 16:16:14'
>>> new_df = df.withColumn('time',unix_timestamp(lit(timestamp),'yyyy-MM-dd HH:mm:ss').cast("timestamp"))
>>> new_df.show(truncate = False)
+---+-----+---------------------+
|age|name |time |
+---+-----+---------------------+
|1 |Alice|2017-08-02 16:16:14.0|
|2 |Again|2017-08-02 16:16:14.0|
+---+-----+---------------------+
>>> new_df.printSchema()
root
|-- age: long (nullable = true)
|-- name: string (nullable = true)
|-- time: timestamp (nullable = true)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With