Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Time series plot gets offset by 2 hours if scale_x_datetime is used

Problem:

I'm trying to plot a time series with ggplot but the data gets offset by two hours for some reason.

Data:

> test <- structure(list(interval = structure(c(1465423500, 1465423800, 
1465424100, 1465424400, 1465424700, 1465425000, 1465425300, 1465425600, 
1465425900, 1465426200, 1465426500, 1465426800, 1465427100), class = c("POSIXct", 
"POSIXt"), tzone = ""), mean = c(0.339622641509434, 0.132075471698113, 
0.150943396226415, 0.0754716981132075, 2.09433962264151, 0.528301886792453, 
0.867924528301887, 0, 1.47169811320755, 0.30188679245283, 0.132075471698113, 
0.320754716981132, 0.679245283018868)), .Names = c("interval", 
"mean"), class = c("tbl_df", "data.frame"), row.names = c(NA, 
-13L))

> test
    Source: local data frame [13 x 2]     

interval      mean     
(time)     (dbl)     
1  2016-06-09 00:05:00 0.3396226 # First value: 5 minutes past midnight    
2  2016-06-09 00:10:00 0.1320755     
3  2016-06-09 00:15:00 0.1509434     
4  2016-06-09 00:20:00 0.0754717     
5  2016-06-09 00:25:00 2.0943396     
6  2016-06-09 00:30:00 0.5283019     
7  2016-06-09 00:35:00 0.8679245     
8  2016-06-09 00:40:00 0.0000000     
9  2016-06-09 00:45:00 1.4716981     
10 2016-06-09 00:50:00 0.3018868     
11 2016-06-09 00:55:00 0.1320755     
12 2016-06-09 01:00:00 0.3207547     
13 2016-06-09 01:05:00 0.6792453

Example

This works fine:

g <- ggplot(interval.steps, aes(interval, mean))
g + geom_line()

X axis labels show actual data

But this doesn't:

g <- ggplot(interval.steps, aes(interval, mean))
g + geom_line() +
    scale_x_datetime(date_labels = '%H:%M') # offsets times by -2 hours

enter image description here

Question

What am I doing wrong? Thanks in advance.

like image 388
pfabri Avatar asked Jun 09 '16 22:06

pfabri


People also ask

What is a time series boxplot?

One particular type of time series plots is time series boxplot. A time series boxplot is a useful way to visualize your dataset when you have multiple data points in a specific time interval. For example, you collect the temperature of a location every hourly, over a period of one month.

What is a time series dataset?

A time series dataset is a collection of data that are time-indexed and collected over a period of time. Using a time series, you can plot interesting visualizations that illustrate the change in values of the subject under study over a period of time. One particular type of time series plots is time series boxplot.

How to plot time series using axes of type date in Python?

Time Series using Axes of type date Time series can be represented using either plotly.express functions (px.line, px.scatter, px.bar etc) or plotly.graph_objects charts objects (go.Scatter, go.Bar etc). For more examples of such charts, see the documentation of line and scatter plots or bar charts.

How do you plot time series in Plotly?

Time series can be represented using either plotly.express functions (px.line, px.scatter, px.bar etc) or plotly.graph_objects charts objects (go.Scatter, go.Bar etc). For more examples of such charts, see the documentation of line and scatter plots or bar charts.


2 Answers

It looks like scale_x_datetime is changing the timezone of interval from your local timezone to UTC. The function below should resolve the problem.

# Source: http://stackoverflow.com/a/11002253/496488
# Put in your local timezone. I've inserted mine just for illustration.
date_format_tz <- function(format = "%H:%M", tz = "America/Los_Angeles") {
  function(x) format(x, format, tz=tz)
}

g <- ggplot(interval.steps, aes(interval, mean))
g + geom_line() +
  scale_x_datetime(labels = date_format_tz())
like image 110
eipi10 Avatar answered Sep 30 '22 19:09

eipi10


Time zone-independent implementation

Eipi10's answer above is a good workaround. However, I wanted to avoid hardcoding a time zone setting into my program in order to make it reproducible in any locale. The way to achieve this is very simple, just leave out the tz parameter:

# Generator function to create 'hh:mm' labels for the x axis
# without explicit 'tz' specification  
date_format <- function(format = "%H:%M") {
    
    function(x) format(x, format)
}

Advantage

The advantage of this method is that it works regardless of the time zone parameter of the original variable and the current locale.

For example if your time values were read in with something like this:as.POSIXct(interval, format = '%H:%M', tz = 'Pacific/Honolulu'), the graph will still be plotted with the correct X axis labels, even if you're in, say, Zimbabwe.

like image 20
pfabri Avatar answered Sep 30 '22 19:09

pfabri