I'd like to add an "id" annotation to certain observations in a histogram.
So far, I'm able to add the annotation with no problem, but I'd like the 'y' position of my annotations to be the count of the bin + 1 (for aesthetic reasons).
This is what I have so far:
library(tidyverse)
library(ggrepel)
selected_obs <- c("S10", "S100", "S245", "S900")
set.seed(0)
values <- rnorm(1000)
plot_df <- tibble(id = paste0("S", 1:1000),
values = values) %>%
mutate(obs_labels = ifelse(id %in% selected_obs, id, NA))
ggplot(plot_df, aes(values)) +
geom_histogram(binwidth = 0.3, color = "white") +
geom_label_repel(aes(label = obs_labels, y = 100))
I've seen multiple answers dealing with annotating the count for each bin using geom_text(stat = count", aes(y=..count.., label=..count..)
.
Based on that, I've tried these two work-arounds, but no success:
geom_label_repel(stat = "count", aes(label = obs_labels, y = ..count..))
yields:
"Error: geom_label_repel requires the following missing aesthetics: label"geom_label_repel(aes(label = obs_labels, y = ..count..))
yields "Error: Aesthetics must be valid computed stats. Problematic aesthetic(s): y = ..count...
Did you map your stat in the wrong layer?".Anybody that can shed some light here?
That may be a mildly misleading visualisation, because you are labelling a unique ID, but with the positioning of this label to the count height you are suggesting that this ID was counted that often. Anyways.
The most straight forward option is to manually calculate the bin to which your ID belongs, then count this bin, and then use this data in order to set the x and y for your labels.
Unfortunately, I have to use R online and cannot create a nice reprex, therefore including a screenshot. But the code should be reproducible, as it is running online
library(tidyverse)
library(ggrepel)
selected_obs <- c("S10", "S100", "S245", "S900")
set.seed(0)
values <- rnorm(1000)
plot_df <- tibble(id = paste0("S", 1:1000),
values = values) %>%
mutate(obs_labels = ifelse(id %in% selected_obs, id, NA),
bins = as.factor( as.numeric( cut(values, 30)))) # cutting into 30 bins
label_df<- plot_df %>% filter(id %in% selected_obs) %>% left_join(plot_df, by = 'bins') %>%
group_by(values = values.x, obs_labels = obs_labels.x) %>% count
ggplot(plot_df, aes(values)) +
geom_histogram(color = "white") + # removed your bin argument, as to default to 30
geom_label(data = label_df, aes(label = obs_labels, y = n))
The label positions are not quite perfect - this is because I chose to cut into 30 equal bins and the binning may be slightly different between cut
and histogram
. This may need some tweaking, depending on the size of your bins, and if you include upper/lower margins.
P.S. Credit to cut into equal bins goes to this answer by user pedrosaurio
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With