Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ggplot2 density of circular data

I have a data set where x represents day of year (say birthdays) and I want to create a density graph of this. Further, since I have some grouping information (say boys or girls), I want to use the capabilities of ggplot2 to make a density plot.

Easy enough at first:

require(ggplot2); require(dplyr)
bdays <- data.frame(gender = sample(c('M', 'F'), 100, replace = T), bday = sample(1:365, 100, replace = T))
bdays %>% ggplot(aes(x = bday)) + geom_density(aes(color = factor(gender)))

However, this gives a poor estimate because of edge effects. I want to apply the fact that I can use circular coordinates so that 365 + 1 = 1 -- one day after December 31st is January 1st. I know that the circular package provides this functionality, but I haven't had any success implementing it using a stat_function() call. It's particularly useful for me to use ggplot2 because I want to be able to use facets, aes calls, etc.

Also, for clarification, I would like something that looks like geom_density -- I am not looking for a polar plot like the one shown at: Circular density plot using ggplot2.

like image 489
mbarete Avatar asked Mar 28 '16 16:03

mbarete


1 Answers

To remove the edge effects you could stack three copies of the data, create the density estimate, and then show the density only for the middle copy of data. That will guarantee "wrap around" continuity of the density function from one edge to the other.

Below is an example comparing your original plot with the new version. I've used the adjust parameter to set the same bandwidth between the two plots. Note also that in the circularized version, you'll need to renormalize the densities if you want them to add to 1:

set.seed(105)
bdays <- data.frame(gender = sample(c('M', 'F'), 100, replace = T), bday = sample(1:365, 100, replace = T))

# Stack three copies of the data, with adjusted values of bday
bdays = bind_rows(bdays, bdays, bdays)
bdays$bday = bdays$bday + rep(c(0,365,365*2),each=100)

# Function to adjust bandwidth of density plot
# Source: http://stackoverflow.com/a/24986121/496488
bw = function(b,x) b/bw.nrd0(x)

# New "circularized" version of plot
bdays %>% ggplot(aes(x = bday)) + 
  geom_density(aes(color = factor(gender)), adjust=bw(10, bdays$bday[1:100])) +
  coord_cartesian(xlim=c(365, 365+365+1), expand=0) +
  scale_x_continuous(breaks=seq(366+89, 366+365, 90), labels=seq(366+89, 366+365, 90)-365) +
  scale_y_continuous(limits=c(0,0.0016))
  ggtitle("Circularized")

# Original plot
ggplot(bdays[1:100,], aes(x = bday)) + 
  geom_density(aes(color = factor(gender)), adjust=bw(30, bdays$bday[1:100])) +
  scale_x_continuous(breaks=seq(90,360,90), expand=c(0,0)) +
  ggtitle("Not Circularized")

enter image description here

like image 128
eipi10 Avatar answered Oct 03 '22 04:10

eipi10