Timestamp parsing in pyspark

Tags:

apache-spark

pyspark

df1:

Timestamp:

1995-08-01T00:00:01.000+0000

Is there a way to separate the day of the month in the timestamp column of the data frame using pyspark. Not able to provide the code, I am new to spark. I do not have a clue on how to proceed.

233

asked Aug 07 '16 01:08

data_person

1 Answers

You can parse this timestamp using unix_timestamp:

from pyspark.sql import functions as F

format = "yyyy-MM-dd'T'HH:mm:ss.SSSZ"
df2 = df1.withColumn('Timestamp2', F.unix_timestamp('Timestamp', format).cast('timestamp'))

Then, you can use dayofmonth in the new Timestamp column:

df2.select(F.dayofmonth('Timestamp2'))

More detials about these functions can be found in the pyspark functions documentation.

answered Oct 18 '22 07:10

Daniel de Paula

Related questions
                            
                                How to split multi-value column into separate rows using typed Dataset?
                            
                                How to tune memory for Spark Application running in local mode
                            
                                How to get data of previous row in Apache Spark
                            
                                How does Spark-submit in cluster deploy mode manage the application Jars
                            
                                When running with master 'yarn' either HADOOP_CONF_DIR or YARN_CONF_DIR must be set in the environment
                            
                                Compare Value of Current and Previous Row in Spark
                            
                                How to pass DataFrame as input to Spark UDF?
                            
                                Error while running PySpark DataProc Job due to python version
                            
                                Spark collect_list and limit resulting list
                            
                                call of distinct and map together throws NPE in spark library
                            
                                spark-How can I retrieve item-pair after calculating similarity using RowMatrix
                            
                                Not able to declare String type accumulator
                            
                                SPARK Is sample method on Dataframes uniform sampling?
                            
                                Spark DataFrame handing empty String in OneHotEncoder
                            
                                Pyspark .toPandas() results in object column where expected numeric one
                            
                                What happens if I try to use more cores than I have?
                            
                                Why does Spark throw "SparkException: DStream has not been initialized" when restoring from checkpoint?
                            
                                Convert string to timestamp for Spark using Scala
                            
                                Spark SQL fails because "Constant pool has grown past JVM limit of 0xFFFF"
                            
                                PySpark truncate a decimal

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With