Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does coord_equal break my heatmap

I'm trying to create a heatmap out of the following data:

> head(myData.aggregated)
             datetime value       date                time
1 2016-03-31 14:19:00     3 2016-03-31 2016-06-11 14:19:00
2 2016-03-31 14:49:00    69 2016-03-31 2016-06-11 14:49:00
3 2016-03-31 15:49:00     5 2016-03-31 2016-06-11 15:49:00
4 2016-03-31 16:19:00     7 2016-03-31 2016-06-11 16:19:00
5 2016-03-31 17:49:00     2 2016-03-31 2016-06-11 17:49:00
6 2016-03-31 18:19:00     7 2016-03-31 2016-06-11 18:19:00

> tail(myData.aggregated)
              datetime value       date                time
90 2016-04-06 13:19:00     1 2016-04-06 2016-06-11 13:19:00
91 2016-04-06 13:49:00    25 2016-04-06 2016-06-11 13:49:00
92 2016-04-06 14:19:00     7 2016-04-06 2016-06-11 14:19:00
93 2016-04-06 14:49:00     1 2016-04-06 2016-06-11 14:49:00
94 2016-04-06 22:19:00     3 2016-04-06 2016-06-11 22:19:00
95 2016-04-06 22:49:00    14 2016-04-06 2016-06-11 22:49:00

And the following ggplot2 commands.

ggplot(myData.aggregated, aes(x = time, y = date, fill = scale(value))) + geom_tile() + coord_equal()

As soon as I add coord_equal() the result is a blank graph. Can someone explain to me why this is happening and how I can fix it. My goal is to get a heatmap with square tiles for each 30 min interval.

Update 1:

> dput(head(myData.aggregated))
structure(list(datetime = structure(c(1459426740, 1459428540, 
1459432140, 1459433940, 1459439340, 1459441140), class = c("POSIXct", 
"POSIXt"), tzone = ""), value = c(3L, 69L, 5L, 7L, 2L, 7L), date = structure(c(16891, 
16891, 16891, 16891, 16891, 16891), class = "Date"), time = structure(c(1465647540, 
1465649340, 1465652940, 1465654740, 1465660140, 1465661940), class = c("POSIXct", 
"POSIXt"), tzone = "")), .Names = c("datetime", "value", "date", 
"time"), row.names = c(NA, 6L), class = "data.frame")
like image 896
Oliver Avatar asked Feb 06 '23 19:02

Oliver


1 Answers

TL;DR: The y-axis spans six units and the x-axis spans tens-of-thousands of units. When you add coord_equal, the y-axis gets squashed to roughly 1/10,000th the physical length of the x-axis, effectively making the plot area disappear. The date column (y-axis) happens to be in days and the time column (x-axis) in seconds, but both are treated as unitless numbers by ggplot. You can denominate the y-axis in seconds also, but that will still give you a plot with an undesirable aspect ratio of at least 6:1. See below for code and additional detail.


Here's what's happening: date is in Date format and is therefore denominated in days, with a range of 6 days. time is in POSIXct format, which is denominated in seconds, with a range (since we're only interested in the time of day, regardless of date) of tens-of-thousands of seconds (up to a maximum of 86,400 seconds, or the length of one day).

The underlying values of Date and POSIXct formats are just numeric values with, respectively, Date and POSIXct classes attached. As a result, when you add coord_equal, one unit on the y-axis takes up the same physical distance as 1 unit on the x-axis because ggplot (apparently) calculates coord_equal based on the numeric magnitudes of the values, without regard to their date-time class. But the entire y-axis spans 6 units while the x-axis spans tens-of-thousands of units. Thus, when you require coord_equal, the y:x aspect ratio gets squashed to on the order of 1:10,000 or so, making the plot disappear for all practical purposes.

You can denominate both the x and y axes in seconds, but even then the y-axis will span at least six times the range (6 days) as the x-axis (maximum of one day), resulting in a y:x aspect ratio of at least 6:1 with coord_equal, which is better than 1:10,000, but still not very practical.

Here's an example with fake data:

# Fake data
set.seed(4959)
dat = data.frame(datetime=seq(as.POSIXct("2016-03-31"), as.POSIXct("2016-04-06"), by="hour"))
dat$value = sample(1:50, nrow(dat), replace=TRUE)

ggplot(dat, 
       aes(x = as.POSIXct(as.numeric(datetime) %% 86400, 
                          tz="UTC", origin=as.Date("2016-01-01")), 
           y = as.POSIXct(as.Date(datetime)), 
           fill = scale(value))) + 
  geom_tile() + 
  labs(y="Date", x="Time") + 
  scale_x_datetime(date_labels="%H:%m") +
  coord_equal()

In the code above, to create the y values we first convert to Date format, which eliminates the time of day and then convert back to POSIXct which converts the unit to seconds, but with time equal to midnight on that day for all datetime values on a given date.

To create the x values, we just want time of day in seconds after midnight, so we calculate the remainder of the numeric value of datetime after division by 86400 (number of seconds in a day). The tz=UTC is necessary to get the hours right and origin (which can be any date; we just want the time of day) is necessary to get the function to run without an error.

Below is what the plot looks like with and without coord_equal. Note that with coord_equal the x-axis, which spans one day of time (from midnight to midnight) has the same length as one day on the y axis. That's because we denominated both the y and x values in seconds. However, as long as the y axis spans several days and the x-axis spans only one day, coord_equal will result in an undesirable aspect ratio.

enter image description here

Below is a demonstration of how the y-axis gets squashed relative to the x-axis if the y values are denominated in days rather than seconds and coord_equal is specified:

ggplot(dat, 
       aes(x = as.POSIXct(as.numeric(datetime) %% 86400, 
                          tz="UTC", origin=as.Date("2016-01-01")), 
           y = as.Date(datetime), 
           fill = scale(value))) + 
  geom_tile() + 
  labs(y="Date", x="Time") + 
  scale_x_datetime(date_labels="%H:%m") + 
  coord_equal()

enter image description here

like image 115
eipi10 Avatar answered Feb 10 '23 14:02

eipi10