I feel like this should be an easy task for ggplot
, tidyverse
, lubridate
, but I cannot seem to find an elegant solution.
GOAL: Create a bar graph of my data aggregated/summarized/grouped_by year and month.
#Libraries
library(tidyverse)
library(lubridate)
# Data
date <- sample(seq(as_date('2013-06-01'), as_date('2014-5-31'), by="day"), 10000, replace = TRUE)
value <- rnorm(10000)
df <- tibble(date, value)
# Summarise
df2 <- df %>%
mutate(year = year(date), month = month(date)) %>%
unite(year_month,year,month) %>%
group_by(year_month) %>%
summarise(avg = mean(value),
cnt = n())
# Plot
ggplot(df2) +
geom_bar(aes(x=year_month, y = avg), stat = 'identity')
When I create the year_month variable, it naturally becomes a character variable instead of a date variable. I have also tried grouping by year(date), month(date)
but then I can't figure out how to use two variables as the x-axis in ggplot
. Perhaps this could be solved by flooring the dates to the first day of the month...?
%>% is a pipe operator reexported from the magrittr package. Start by reading the vignette. Adding things to a ggplot changes the object that gets created. The print method of ggplot draws an appropriate plot depending upon the contents of the variable.
ggplot2 allows you to do data manipulation, such as filtering or slicing, within the data argument.
The first argument is the source of the data. The second argument maps the data components of interest into components of the graph. That argument is a function called <code>aes()</code>, which stands for <em>aes</em>thetic mapping.
ggplot2 is a plotting package that provides helpful commands to create complex plots from data in a data frame. It provides a more programmatic interface for specifying what variables to plot, how they are displayed, and general visual properties.
You were really close. The missing pieces are floor_date()
and scale_x_date()
:
library(tidyverse)
library(lubridate)
date <- sample(seq(as_date('2013-06-01'), as_date('2014-5-31'), by = "day"),
10000, replace = TRUE)
value <- rnorm(10000)
df <- tibble(date, value) %>%
group_by(month = floor_date(date, unit = "month")) %>%
summarize(avg = mean(value))
ggplot(df, aes(x = month, y = avg)) +
geom_bar(stat = "identity") +
scale_x_date(NULL, date_labels = "%b %y", breaks = "month")
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With