library(tidyverse)
library(lubridate)
date <- seq(ymd('2018-08-01'), ymd('2018-08-31'), by = '1 day')
c <- 21.30
x1 <- runif(length(date), 0, 20)
x2 <- rnorm(length(date), 10, 3)
x3 <- abs(rnorm(length(date), 40, 10))
data <- data.frame(c, x1, x2, x3) %>%
t() %>% as.data.frame() %>% rownames_to_column('var')
data <- data %>%
mutate(category1 = c('catA', 'catB', 'catB', 'catC') %>% as.factor(),
category2 = c('catAA', 'catBA', 'catBB', 'catCA') %>% as.factor())
names(data) <- c('var', as.character(date), 'category1', 'category2')
data_long <- data %>%
gather(date, value, -var, -category1, -category2) %>%
mutate(date = ymd(date))
data_long %>%
ggplot(aes(date, value, fill = category1)) +
geom_col(position = 'stack') +
scale_x_date(breaks = '1 week', date_labels = '%Y-%m-%d', expand = c(.01, .01)) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 90, vjust = .4)) +
labs(fill = '')
With the example data and code above I generate the following plot:
What I need to do is to remove white spaces between columns. I have found some similar topics, but they recommended use of position_dodge()
while it can't be used in my case as I already have position = 'stack'
, which can't be replaced. How can I make the columns adjacent to each other then?
Setting width = 1
, as proposed by @camille, seems to work ok with the raw data, but not with aggregated to weeks or months - please see the code below:
data_long %>%
mutate(date = floor_date(date, unit = 'week', week_start = 1)) %>%
group_by(category1, date) %>%
summarise(value = sum(value, na.rm = TRUE)) %>%
ungroup() %>%
ggplot(aes(date, value, fill = category1, width = 1)) +
geom_col(position = 'stack') +
scale_x_date(breaks = '1 month', date_labels = '%Y-%m', expand = c(.01, .01)) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 90, vjust = .4)) +
labs(fill = '')
As pointed out by @Camille, width of 1 may refer to 1 day in case of date scale. However, the following doesn't produce expected output and returns warning message: position_stack requires non-overlapping x intervals
data_long %>%
mutate(date = floor_date(date, unit = 'month', week_start = 1)) %>%
group_by(category1, date) %>%
summarise(value = sum(value, na.rm = TRUE),
n = n()) %>%
ungroup() %>%
ggplot(aes(date, value, fill = category1, width = n)) +
geom_col(position = 'stack') +
scale_x_date(breaks = '1 month', date_labels = '%Y-%m', expand = c(.01, .01)) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 90, vjust = .4)) +
labs(fill = '')
To make the bars narrower or wider, set width in geom_col (). The default value is 0.9; larger values make the bars wider, and smaller values make the bars narrower (Figure 3.13 ). For example, for standard-width bars: And for wider bars (these have the maximum width of 1):
Bar chart. geom_col makes the height of the bar from the values in dataset. bar width. By default, set to 90% of the resolution of the data Using the described geometry, you can insert a simple geometric object into your data visualization – bar layer that is defined by two positional aesthetic properties ( x and y ).
str_remove_all () function takes 2 arguments, first the entire string on which the removal operation is to be performed and the character whose all the occurrences are to be removed. Example: R program to remove whitespaces using str_remove_all ()
Thus, the geom_col at position x draws the bar to the coordinate defined by the variable y. If x has multiple values, these are stacked ( position property = stack ). An example of use is the following figure. On the X axis, we mapped the carat values from the diamond database.
The docs for geom_col
are more specific than what I put in my comment above. The more detailed meaning of the width parameter:
Bar width. By default, set to 90% of the resolution of the data.
In a general case, such as your first one, this probably just means the distance between one discrete case and another. But in the case of dates, which have a real resolution, this seems to refer to days. I'm not sure if there's a different way to set the resolution of the dates, such as for one unit to refer to one week, instead of one day.
I'm decreasing the alpha just to be able to see if bars overlap.
So without setting a width, this defaults to 90% of the distance between observations, i.e. 90% of one week.
library(tidyverse)
library(lubridate)
...
summarized <- data_long %>%
mutate(date = floor_date(date, unit = 'week', week_start = 1)) %>%
group_by(category1, date) %>%
summarise(value = sum(value, na.rm = TRUE)) %>%
ungroup()
ggplot(summarized, aes(date, value, fill = category1)) +
geom_col(alpha = 0.6) +
scale_x_date(breaks = '1 week', expand = c(.01, .01))
Setting width to 1 means the width is 1 day. I feel like there's a discrepancy here that someone else might be able to explain, why this is read as 1 day rather than 100% of the resolution.
ggplot(summarized, aes(date, value, fill = category1)) +
geom_col(alpha = 0.6, width = 1) +
scale_x_date(breaks = '1 week', expand = c(.01, .01))
So to get a width of 1 week, aka 7 days, set width to 7. Again, I think there's a bit of explanation someone else could fill in here.
ggplot(summarized, aes(date, value, fill = category1)) +
geom_col(alpha = 0.6, width = 7) +
scale_x_date(breaks = '1 week', expand = c(.01, .01))
Edit: Based on the link in my comment, the best way might just be converting the dates to strings so you can just plot on a discrete x-scale as normal. Before you call as.character
, you could do whatever formatting you might want.
summarized %>%
mutate(date = as.character(date)) %>%
ggplot(aes(x = date, y = value, fill = category1)) +
geom_col(width = 1)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With