Handling microseconds in Spark Scala

Tags:

I imported a PostgreSQL table into spark as a dataframe using Scala. The dataframe looks like

user_id | log_dt  
--------| -------    
96      | 2004-10-19 10:23:54.0    
1020    | 2017-01-12 12:12:14.931652

I am transforming this dataframe to have the data format for log_dt as yyyy-MM-dd hh:mm:ss.SSSSSS. To achieve this I used the following code to convert the log_dt to timestamp format using unix_timestamp function.

val tablereader1 = tablereader1Df.withColumn("log_dt",unix_timestamp(tablereader1Df("log_dt"),"yyyy-MM-dd hh:mm:ss.SSSSSS").cast("timestamp"))

When I print to print the tablereader1 dataframe using the command tablereader1.show() I get the following result

user_id | log_dt  
--------| -------
96      | 2004-10-19 10:23:54.0
1020    | 2017-01-12 12:12:14.0

How can I retain the microseconds as part of the timestamp? Any suggestions are appreciated.

950

asked Jan 26 '17 17:01

Sid

1 Answers

Milleseconds with `date_format()`

You can use Spark SQL date_format() which accepts Java SimpleDateFormat patterns. SimpleDateFormat can parse till milleseconds only with pattern "S".

import org.apache.spark.sql.functions._
import spark.implicits._ //to use $-notation on columns

val df = tablereader1Df.withColumn("log_dt", date_format($"log_dt", "S"))

Update: Microseconds with LocalDateTime of Java 8

//Imports
import java.time.LocalDateTime;
import java.time.format.DateTimeFormatter;
import java.time.temporal.ChronoField;

/* //Commented as per comment about IntelliJ
spark.udf.register("date_microsec", (dt: String) => 
   val dtFormatter = DateTimeFormatter.ofPattern("yyyy-MM-dd HH:mm:ss.n")
   LocalDateTime.parse(dt, dtFormatter).getLong(ChronoField.MICRO_OF_SECOND)
)
*/

import org.apache.spark.sql.functions.udf

val date_microsec = udf((dt: String) => {
    val dtFormatter = DateTimeFormatter.ofPattern("yyyy-MM-dd HH:mm:ss.n")
    LocalDateTime.parse(dt, dtFormatter).getLong(ChronoField.MICRO_OF_SECOND)
})

Check: help in building DateTimeFormatter pattern

Use ChronoField.NANO_OF_SECOND instead of ChronoField.MICRO_OF_SECOND to fetch Nanosecond in UDF.

val df = tablereader1Df.withColumn("log_date_microsec", date_microsec($"log_dt"))

190

answered Oct 05 '22 02:10

mrsrinivas

Related questions
                            
                                Daemon Threads, thread count, and total started thread count
                            
                                gson.fromJson return null values
                            
                                Realm: Order of records was changed
                            
                                LISTEN/NOTIFY pgconnection goes down java?
                            
                                Android Gradle's dependency cache may be corrupt and gradle build not working
                            
                                Deserializing JSON string in android
                            
                                How to generate a list of arrays (all have the length M) with N possible elements (M > N) in Java?
                            
                                Apache XmlBeans NullPointerException
                            
                                Android VideoView stop streaming after specific time
                            
                                How to mock Asynchronous (@Async) method in Spring Boot using Mockito?
                            
                                How to get history of Pods run on Kubernetes Node?
                            
                                What exactly does "connectionTimeout" means in Tomcat?
                            
                                Apache Camel SQL Batch insertion taking long time
                            
                                Dag-scheduler-event-loop java.lang.OutOfMemoryError: unable to create new native thread
                            
                                Spring Boot 1.4.2.RELEASE with Eureka Server - Exception: org.springframework.beans.factory.NoSuchBeanDefinitionException
                            
                                Failed to delete a file in Windows using Java
                            
                                What is the proper "@link" or "@see" javadoc tag for inner/nested class constructors?
                            
                                Java 9 on Windows with large fonts
                            
                                Register custom URLStreamHandler in Spring web application (Tomcat)
                            
                                How to design a system which sends records and retries sending them again, if an acknowledgement is not receieved? [closed]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Handling microseconds in Spark Scala

Tags:

java

datetime

scala

apache-spark

apache-spark-sql

Sid

People also ask

1 Answers

Milleseconds with `date_format()`

Update: Microseconds with LocalDateTime of Java 8

mrsrinivas

Recent Activity

Donate For Us

Handling microseconds in Spark Scala

Tags:

java

datetime

scala

apache-spark

apache-spark-sql

Sid

People also ask

1 Answers

Milleseconds with date_format()

Update: Microseconds with LocalDateTime of Java 8

mrsrinivas

Related questions

Recent Activity

Donate For Us

Milleseconds with `date_format()`