Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ggplot using grouped date variables (such as year_month)

I feel like this should be an easy task for ggplot, tidyverse, lubridate, but I cannot seem to find an elegant solution.

GOAL: Create a bar graph of my data aggregated/summarized/grouped_by year and month.

#Libraries
library(tidyverse)
library(lubridate)

# Data
date <- sample(seq(as_date('2013-06-01'), as_date('2014-5-31'), by="day"), 10000, replace = TRUE)
value <- rnorm(10000)
df <- tibble(date, value)

# Summarise
df2 <- df %>%
  mutate(year = year(date), month = month(date)) %>%
  unite(year_month,year,month) %>%
  group_by(year_month) %>%
  summarise(avg = mean(value),
            cnt = n())
# Plot
ggplot(df2) +
  geom_bar(aes(x=year_month, y = avg), stat = 'identity')

When I create the year_month variable, it naturally becomes a character variable instead of a date variable. I have also tried grouping by year(date), month(date) but then I can't figure out how to use two variables as the x-axis in ggplot. Perhaps this could be solved by flooring the dates to the first day of the month...?

like image 682
Jeff Parker Avatar asked Nov 27 '17 02:11

Jeff Parker


People also ask

What does %>% do in Ggplot?

%>% is a pipe operator reexported from the magrittr package. Start by reading the vignette. Adding things to a ggplot changes the object that gets created. The print method of ggplot draws an appropriate plot depending upon the contents of the variable.

Can you filter within Ggplot?

ggplot2 allows you to do data manipulation, such as filtering or slicing, within the data argument.

What is the first argument in the Ggplot () function?

The first argument is the source of the data. The second argument maps the data components of interest into components of the graph. That argument is a function called <code>aes()</code>, which stands for <em>aes</em>thetic mapping.

What is Ggplot data?

ggplot2 is a plotting package that provides helpful commands to create complex plots from data in a data frame. It provides a more programmatic interface for specifying what variables to plot, how they are displayed, and general visual properties.


1 Answers

You were really close. The missing pieces are floor_date() and scale_x_date():

library(tidyverse)
library(lubridate)

date <- sample(seq(as_date('2013-06-01'), as_date('2014-5-31'), by = "day"),
  10000, replace = TRUE)
value <- rnorm(10000)

df <- tibble(date, value) %>% 
  group_by(month = floor_date(date, unit = "month")) %>%
  summarize(avg = mean(value))

ggplot(df, aes(x = month, y = avg)) + 
  geom_bar(stat = "identity") + 
  scale_x_date(NULL, date_labels = "%b %y", breaks = "month")

enter image description here

like image 170
Jeffrey Girard Avatar answered Oct 07 '22 22:10

Jeffrey Girard