Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Timestamp parsing in pyspark

df1:

Timestamp:

1995-08-01T00:00:01.000+0000

Is there a way to separate the day of the month in the timestamp column of the data frame using pyspark. Not able to provide the code, I am new to spark. I do not have a clue on how to proceed.

like image 233
data_person Avatar asked Aug 07 '16 01:08

data_person


People also ask

How do I convert a timestamp to a date in PySpark?

The to_date() function in Apache PySpark is popularly used to convert Timestamp to the date. This is mostly achieved by truncating the Timestamp column's time part. The to_date() function takes TimeStamp as it's input in the default format of "MM-dd-yyyy HH:mm:ss. SSS".

Is timestamp a datatype in PySpark?

Timestamp (datetime. datetime) data type. Converts an internal SQL object into a native Python object.

How do I change the datetime format on PySpark?

In PySpark use date_format() function to convert the DataFrame column from Date to String format.

How do you convert Bigint to PySpark timestamp?

You can use from_unixtime/to_timestamp function in spark to convert Bigint column to timestamp . Refer this link for more details regards to converting different formats of timestamps in spark.


1 Answers

You can parse this timestamp using unix_timestamp:

from pyspark.sql import functions as F

format = "yyyy-MM-dd'T'HH:mm:ss.SSSZ"
df2 = df1.withColumn('Timestamp2', F.unix_timestamp('Timestamp', format).cast('timestamp'))

Then, you can use dayofmonth in the new Timestamp column:

df2.select(F.dayofmonth('Timestamp2'))

More detials about these functions can be found in the pyspark functions documentation.

like image 91
Daniel de Paula Avatar answered Oct 18 '22 07:10

Daniel de Paula