Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Faceted horizontal divergent stacked bar plot including negative values using dplyr and ggplot

I hope that this example will be clear. I would like to have stacked bars where the middle bar spans '0', for it represents a neutral value. This is used with a Likert scale. For reproducibility am using the diamonds data set.

The following example is close enough to my use case and demonstrates the difficulty I'm having getting the "good" or "positive" data to be in the correct order (so that neutral is closest to 0).

Here is my code:

require(tidyverse)

diamonds_new <- diamonds %>%
  mutate(quality = fct_recode(cut, "Very poor" = "Fair", "Poor" = "Good", "Neutral" = "Very Good", "Good" = "Premium", "Excellent" = "Ideal")) %>% 
  select(color, clarity, quality) %>% 
  group_by(color, clarity, quality) %>% count()

diamonds_bad <- 
  diamonds_new %>% filter(quality %in% c("Very poor", "Poor", "Neutral")) %>% 
  mutate(n = ifelse(quality == "Neutral", -n/2, -n))

diamonds_good <- 
  diamonds_new %>% filter(quality %in% c("Neutral", "Good", "Excellent")) %>% 
  mutate(n = ifelse(quality == "Neutral", n/2, n)) # %>% 
#  arrange(color, clarity, desc(quality))  # this doesn't seem to make a difference

ggplot() + geom_col(data = diamonds_bad, aes(x=color, y = n, fill = quality)) +  
  geom_col(data = diamonds_good, aes(x=color, y = n, fill = quality)) + 
  facet_grid(. ~ clarity, scales = "free") + 
  coord_flip()

enter image description here I have also tried using scale_fill_manual() but haven't found a way for that to work, either.

I believe that this is more complicated than some existing examples that don't have the complication of negative values or of needing to span 0. Using the current version of ggplot, what am I missing?

Also, am I correct that the positive and negative set need to be split or, at least, that it is easier to do so?

like image 786
Jonathan Sibley Avatar asked Oct 17 '22 16:10

Jonathan Sibley


2 Answers

The columns created by geom_colare formed using position_stack which stacks positive and negative values separately where positive values stack upwards and negative values downwards. The center group, Neutral in this example, is made to span 0 by setting it equal to half of its original value and then plotting it as both a positive and negative value. Also, the order of the groups will need to be reversed for the positive values.

This approach would be helpful for presenting results of some surveys I work with so I've made it into a function to make it more general.

library(tidyverse)
#
# summarize groups and save counts in variable quality_cnt
#
  diamonds_cnt <- diamonds %>%
    mutate(quality = fct_recode(cut, "Very_Poor" = "Fair", "Poor" = "Good",
                                "Neutral" = "Very Good", "Good" = "Premium", "Excellent" = "Ideal")) %>%
    select(color, clarity, quality) %>%
    group_by(color, clarity, quality) %>% summarize(quality_cnt = n())

# make function to plot counts    

  plot_ratings <- function(survey, rated_item, rating_cnt, rating, rating_cat, facet = "wrap") {
#
#  Input:   
#         rated_item  =  unquoted variable name of rated items
#         rating = unquoted variable name of ratings for each rated_items; 
#                  variable should be a factor ordered from lowest to highest 
#         rating_cnt = unquoted variable name of counts or frequencies for each rated_item 
#         rated_cat = unquoted variable name of categories of rated items
#         facet  = "grid" for all panels on one row or 
#                   "wrap" to spread panels across multiple rows
#
#  make arguments quosures
#
    rated_item <- enquo(rated_item)
    rating_cnt <- enquo(rating_cnt)  
    rating <- enquo(rating)
    rating_cat <- enquo(rating_cat)
#
# If number of rating levels is odd, find middle rating
#
  rating_levels <- levels(pull(survey, !!rating))
  mid_level <-  ceiling(length(rating_levels)/2)
  mid_rating <- ifelse(length(rating_levels)%%2 == 1, rating_levels[mid_level], NA_character_)  
#
# make local variabels for use with aes
# plot positive and negative columns separately
#
  survey <- survey %>% mutate( rating_plt = !!rating, rating_cnt_plt = !!rating_cnt)

  sp <- ggplot(survey, aes_(x = rated_item,  fill = rating)) + 
        geom_col(data=filter(survey, !!rating %in% tail(rating_levels, mid_level)),
                 aes( y = ifelse(rating_plt == mid_rating, .5*rating_cnt_plt, rating_cnt_plt)),
                 position = position_stack(reverse = TRUE )) +
        geom_col(data=filter(survey, !!rating %in% head(rating_levels, mid_level)),
                 aes( y = ifelse(rating_plt == mid_rating, -.5*rating_cnt_plt, -rating_cnt_plt)),
                 position = "stack") +
        labs(y = rating_cnt) +
        scale_fill_brewer(palette = "RdYlGn", direction = -1) +
        coord_flip() +
        switch(facet,
               grid = facet_grid( facets=rating_cat, scales = "free_x"),
               wrap = facet_wrap( facets=rating_cat, scales = "free_x"))
  plot(sp)
  } 
#
#  Use function to make charts
#
  plot_ratings(diamonds_cnt,  rated_item = color, rating_cnt = quality_cnt, 
               rating = quality, rating_cat = clarity, facet = "wrap")

which gives the chart

enter image description here

like image 88
WaltS Avatar answered Oct 31 '22 02:10

WaltS


Something like this - my key change is to go away from geom_col to geom_rectangle where you get to freely control start and end.

diamonds_new <-  diamonds %>%
  mutate(quality = fct_recode(cut, "Very poor" = "Fair", "Poor" = "Good", "Neutral" =     "Very Good", "Good" = "Premium", "Excellent" = "Ideal")) %>% 
  select(color, clarity, quality) %>% 
  group_by(color, clarity, quality) %>% 
  count() %>% 
  group_by(color, clarity) %>% 
  arrange(quality) %>%
  mutate(end = cumsum(n)) %>%  
  mutate(start = end-n) %>%
  mutate(offset = (end[quality=="Neutral"] + start[quality=="Neutral"])/2) %>%   
  mutate(start = start - offset,
         end = end - offset) %>%
  mutate(colStart = as.numeric(color) + 0.25,
         colEnd = as.numeric(color) + 0.75)

Having seen the second answer (and not seeing any input from OP) I also included an alternative facetting.

ggplot() + 
    geom_rect(data = diamonds_new, aes(xmin=colStart, xmax=colEnd, ymin=start, ymax=end, fill = quality)) +  
    facet_wrap(. ~ clarity, scales="free_x") +        
    coord_flip()

WaltS's answer retains the factor levels on the y-axis which is certainly closer to the original question. Yet it requires extensive changing of the data so I guess there is value in retaining my alternative answer.

enter image description here

If the zero needs to be centered in the panels you need to adjust xlim accordingly.

like image 41
CMichael Avatar answered Oct 31 '22 01:10

CMichael