Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Stacked bar chart with varying widths in ggplot

Tags:

r

ggplot2

I try to build a stacked bar chart with varying widths, so that the width indicates the mean amount of an allocation, whereas the height indicates the numbers of allocations.

Following, you'll find my reproducible data:

procedure = c("method1","method2", "method3", "method4","method1","method2", "method3", "method4","method1","method2", "method3","method4")
sector =c("construction","construction","construction","construction","delivery","delivery","delivery","delivery","service","service","service","service") 
number = c(100,20,10,80,75,80,50,20,20,25,10,4)
amount_mean = c(1,1.2,0.2,0.5,1.3,0.8,1.5,1,0.8,0.6,0.2,0.9) 

data0 = data.frame(procedure, sector, number, amount_mean)

When using geom_bar and including widths within aes, I get the following error message:

position_stack requires non-overlapping x intervals. Furthermore, the bars are no longer stacked. 
bar<-ggplot(data=data0,aes(x=sector,y=number,fill=procedure, width = amount_mean)) + 
geom_bar(stat="identity") 

I also looked at the mekko-package, but it seems that this is only for bar charts.

Here is, what I'd like to have in the end (not based on above data):

desired Outcome (not based on above data)

Any idea how to solve my problem?

like image 557
an_ja Avatar asked Jul 02 '18 12:07

an_ja


1 Answers

I have tried the same, geom_col() as well but I've run to the same problem - with position = "stack" it seems that we can't assign a width parameter without unstacking.

But it turned up, that solution is quite simple - we can use geom_rect() to build such plot "by hand".

There are your data:

df <- data.frame(
  procedure   = rep(paste("method", 1:4), times = 3),
  sector      = rep(c("construction", "delivery", "service"), each = 4),
  amount      = c(100, 20, 10, 80, 75, 80, 50, 20, 20, 25, 10, 4),
  amount_mean = c(1, 1.2, 0.2, 0.5, 1.3, 0.8, 1.5, 1, 0.8, 0.6, 0.2, 0.9)
)

At first I have transformed your data set:

df <- df |>
  mutate(
      amount_mean = amount_mean / max(amount_mean),
      sector_num  = as.numeric(sector)
  ) |>
  arrange(desc(amount_mean)) |>
  group_by(sector) |>
  mutate(
    xmin = sector_num - amount_mean / 2,
    xmax = sector_num + amount_mean / 2,
    ymin = cumsum(lag(amount, default = 0)), 
    ymax = cumsum(amount)
  ) |>
  ungroup()

What I do here:

  1. I scaled down amount_mean, so the 0 >= amount_mean <= 1 (better for plotting, anyway we don't have another scale to show the real values of amount_mean);
  2. I also decoded sector variable into numerical (for plotting, see below);
  3. I've arranged data set in descending order by amount_mean (heavy means - at the bottom, light means on the top);
  4. Grouping by sector, I calculated xmin, xmax to represent the amount_mean, and ymin, ymax for amount. The former two are a bit trickier. ymax is obviouse - you just take a cumulative sum for all amount starting from the first one. You need cumulative sum to calculate ymin as well, but starting from 0. So the first rectangle plotted with ymin = 0, second - with ymin = ymax of previouse triangle etc. All of this is performed withing each separate group of sectors.

Plot the data:

df |>
  ggplot(aes(xmin = xmin, xmax = xmax,
             ymin = ymin, ymax = ymax, 
             fill = procedure
             )
         ) +
  geom_rect() +
  scale_x_continuous(breaks = df$sector_num, labels = df$sector) +
  #ggthemes::theme_tufte() +
  theme_bw() +
  labs(title = "Question 51136471", x = "Sector", y = "Amount") +
  theme(
    axis.ticks.x = element_blank()
  )

Result:

pyramid_plot

Another option to prevent to procedure variable to be reordered. So all let say "reds" are down, "greens" above etc. But it looks ugly:

df <- df |>
  mutate(
      amount_mean = amount_mean / max(amount_mean),
      sector_num = as.numeric(sector)
  ) |>
  arrange(procedure, desc(amount), desc(amount_mean)) |>
  group_by(sector) |>
  mutate(
    xmin = sector_num - amount_mean / 2,
    xmax = sector_num + amount_mean / 2,
    ymin = cumsum(lag(amount, default = 0)), 
    ymax = cumsum(amount)
  ) |>
  ungroup()

pyramid_plot_ugly

like image 143
utubun Avatar answered Nov 14 '22 21:11

utubun