Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to convert a timestamp into string (without changing timezone)?

I have some unix times that I convert to timestamps in sparklyr and for some reasons I also need to convert them into strings.

Unfortunately, it seems that during the conversion to string hive converts to EST (my locale).

df_new <- spark_read_parquet(sc, "/mypath/parquet_*",
                             overwrite = TRUE,
                             name = "df_new",
                             memory = FALSE,
                             options = list(mergeSchema = "true"))
> df_new %>%  
           mutate(unix_t = from_utc_timestamp(timestamp(t) ,'UTC'),
           date_str = date_format(unix_t, 'yyyy-MM-dd HH:mm:ss z'),
           date_alt = to_date(from_utc_timestamp(timestamp(t) ,'UTC'))) %>% 
    select(t, unix_t, date_str, date_alt) %>% head(5)
# Source:   lazy query [?? x 4]
# Database: spark_connection
            t unix_t              date_str                date_alt  
        <dbl> <dttm>              <chr>                   <date>    
1 1419547405. 2014-12-25 22:43:25 2014-12-25 17:43:25 EST 2014-12-25
2 1418469714. 2014-12-13 11:21:54 2014-12-13 06:21:54 EST 2014-12-13
3 1419126103. 2014-12-21 01:41:43 2014-12-20 20:41:43 EST 2014-12-20
4 1419389856. 2014-12-24 02:57:36 2014-12-23 21:57:36 EST 2014-12-23
5 1418271811. 2014-12-11 04:23:31 2014-12-10 23:23:31 EST 2014-12-10

As you can see both date_str and date_alt use the EST timezone. I need UTC here. How can I do that?

Thanks!

like image 850
ℕʘʘḆḽḘ Avatar asked Mar 19 '18 02:03

ℕʘʘḆḽḘ


1 Answers

From the Hive function reference, date_format uses Java's SimpleDateFormat, which I believe always defaults to the JVM time zone, this explains why this gets you a character string converted to your time zone.

One option is to detect the time zone and manually add the hours to get UTC.

Another option would be to use lubridate with spark_apply():

sdf_len(sc, 1) %>%
  mutate(unix_t = from_utc_timestamp(timestamp(1522371003) , 'UDT')) %>%
  spark_apply(
    function(e) {
      dplyr::mutate(
        e,
        time_str = as.character(
          lubridate::with_tz(
            as.POSIXct(unix_t, origin="1970-01-01"),
            "GMT"
          )
        )
      )
    },
    columns = c("id", "unix_t", "time_str"))
like image 166
Javier Luraschi Avatar answered Nov 01 '22 09:11

Javier Luraschi