how to convert a timestamp into string (without changing timezone)?

Question

I have some unix times that I convert to timestamps in sparklyr and for some reasons I also need to convert them into strings.

Unfortunately, it seems that during the conversion to string hive converts to EST (my locale).

df_new <- spark_read_parquet(sc, "/mypath/parquet_*",
                             overwrite = TRUE,
                             name = "df_new",
                             memory = FALSE,
                             options = list(mergeSchema = "true"))
> df_new %>%  
           mutate(unix_t = from_utc_timestamp(timestamp(t) ,'UTC'),
           date_str = date_format(unix_t, 'yyyy-MM-dd HH:mm:ss z'),
           date_alt = to_date(from_utc_timestamp(timestamp(t) ,'UTC'))) %>% 
    select(t, unix_t, date_str, date_alt) %>% head(5)
# Source:   lazy query [?? x 4]
# Database: spark_connection
            t unix_t              date_str                date_alt  
        <dbl> <dttm>              <chr>                   <date>    
1 1419547405. 2014-12-25 22:43:25 2014-12-25 17:43:25 EST 2014-12-25
2 1418469714. 2014-12-13 11:21:54 2014-12-13 06:21:54 EST 2014-12-13
3 1419126103. 2014-12-21 01:41:43 2014-12-20 20:41:43 EST 2014-12-20
4 1419389856. 2014-12-24 02:57:36 2014-12-23 21:57:36 EST 2014-12-23
5 1418271811. 2014-12-11 04:23:31 2014-12-10 23:23:31 EST 2014-12-10

As you can see both date_str and date_alt use the EST timezone. I need UTC here. How can I do that?

Thanks!

Javier Luraschi · Accepted Answer

From the Hive function reference, date_format uses Java's SimpleDateFormat, which I believe always defaults to the JVM time zone, this explains why this gets you a character string converted to your time zone.

One option is to detect the time zone and manually add the hours to get UTC.

Another option would be to use lubridate with spark_apply():

sdf_len(sc, 1) %>%
  mutate(unix_t = from_utc_timestamp(timestamp(1522371003) , 'UDT')) %>%
  spark_apply(
    function(e) {
      dplyr::mutate(
        e,
        time_str = as.character(
          lubridate::with_tz(
            as.POSIXct(unix_t, origin="1970-01-01"),
            "GMT"
          )
        )
      )
    },
    columns = c("id", "unix_t", "time_str"))

how to convert a timestamp into string (without changing timezone)?

Tags:

timestamp

r

apache-spark

hive

sparklyr

ℕʘʘḆḽḘ

1 Answers

Javier Luraschi

Recent Activity

Donate For Us

how to convert a timestamp into string (without changing timezone)?

Tags:

timestamp

r

apache-spark

hive

sparklyr

ℕʘʘḆḽḘ

1 Answers

Javier Luraschi

Related questions

Recent Activity

Donate For Us