Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to add value labels on the flows item of a Alluvial/Sankey plot (on R ggalluvial)?

I'm looking to label the "flow" portion of Alluvial / Sankey chart on R.

The stratums (columns) can easily be labelled, but not the flows connecting them. All my attempts on reading the documentations and experimenting were to no avail.

In the sample below, "freq" is expected to be labelled on the flow connection part.

chart

library(ggplot2)
library(ggalluvial)

data(vaccinations)
levels(vaccinations$response) <- rev(levels(vaccinations$response))
ggplot(vaccinations,
       aes(x = survey, stratum = response, alluvium = subject,
           y = freq,
           fill = response, label = freq)) +
  scale_x_discrete(expand = c(.1, .1)) +
  geom_flow() +
  geom_stratum(alpha = .5) +
  geom_text(stat = "stratum", size = 3) +
  theme(legend.position = "bottom") +
  ggtitle("vaccination survey responses at three points in time")
like image 585
INeedCodes Avatar asked Jan 02 '26 03:01

INeedCodes


1 Answers

There is an option to take the raw numbers and use these as labels for the flow part:

ggplot(vaccinations,
       aes(x = survey, stratum = response, alluvium = subject,
           y = freq,
           fill = response, label = freq)) +
  scale_x_discrete(expand = c(.1, .1)) +
  geom_flow() +
  geom_stratum(alpha = .5) +
  geom_text(stat = "stratum", size = 3) +
  geom_text(stat = "flow", nudge_x = 0.2) +
  theme(legend.position = "bottom") +
  ggtitle("vaccination survey responses at three points in time")

enter image description here

If you want more control over how to label these points, you can extract the layer data and do computations on that. For example we can compute the fractions for only the starting positions as follows:

# Assume 'g' is the previous plot object saved under a variable
newdat <- layer_data(g)
newdat <- newdat[newdat$side == "start", ]
split <- split(newdat, interaction(newdat$stratum, newdat$x))
split <- lapply(split, function(dat) {
  dat$label <- dat$label / sum(dat$label)
  dat
})
newdat <- do.call(rbind, split)

ggplot(vaccinations,
       aes(x = survey, stratum = response, alluvium = subject,
           y = freq,
           fill = response, label = freq)) +
  scale_x_discrete(expand = c(.1, .1)) +
  geom_flow() +
  geom_stratum(alpha = .5) +
  geom_text(stat = "stratum", size = 3) +
  geom_text(data = newdat, aes(x = xmin + 0.4, y = y, label = format(label, digits = 1)),
            inherit.aes = FALSE) +
  theme(legend.position = "bottom") +
  ggtitle("vaccination survey responses at three points in time")

enter image description here

It still is kind of a judgement call about where exactly you want to place the labels. Doing it at the start is the easy way, but if you want these labels to be approximately in the middle and dodging oneanother it would require some processing.

like image 116
teunbrand Avatar answered Jan 03 '26 19:01

teunbrand



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!