Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to make a continuous fill in a ggplot2 bar plot with one variable

Tags:

r

ggplot2

I am using the library ggplot2movies for my data movies

Please keep in mind that I refer to mpaa rating and user rating, which are two different things. In case you don't want to load the ggplot2movies library, here is a sample of the relevant data:

> head(subset(movies[,c(5,17)], movies$mpaa!=""))
# A tibble: 6 x 2
  rating mpaa 
   <dbl> <chr>
1    5.3 R    
2    7.1 PG-13
3    7.2 PG-13
4    4.9 R    
5    4.8 PG-13
6    6.7 PG-13

Here I make a barplot that shows the frequency of films that have any mpaa rating:

ggplot(data=subset(movies, movies$mpaa!=""), aes(mpaa)) +
  geom_bar()

enter image description here

Now I would like to color in the bars with a fill, based on the imdb user rating. I don't want to use factor(rating) because there are an enormous number of different values in the rating column. However, when I try to use a continuous fill like in Assigning continuous fill color to geom_bar I get the same graph.

ggplot(data=subset(movies, movies$mpaa!=""), aes(mpaa, fill=rating)) +
  geom_bar()+ 
  scale_fill_continuous(low="blue", high="red")

I figure it has to do with the fact that my barplot is based on the frequency of a single variable, rather than a dataframe with a count column. I could make a new dataframe of the mpaa categories and their counts, but I'd rather know how to do this graph with the original movies dataset and a single ggplot.

Edit: Using aes(mpaa, group = rating, fill = rating) gives a chart that is almost correct, except that the bars and legend are swapped. enter image description here

like image 935
Jared C Avatar asked Feb 20 '26 15:02

Jared C


1 Answers

You can reverse the legend with: + guides(fill=guide_colourbar(reverse=TRUE)), however, a colour gradient doesn't seem very informative. Another option would be to cut rating into discrete ranges, as in the example below, which provides a more clear indication of the distribution of ratings within each mpaa category. Nevertheless, because of the different bar heights, it's not clear how the average rating or distribution of ratings varies by mpaa category.

library(tidyverse)
library(ggplot2movies)
theme_set(theme_classic())

movies %>% 
  filter(mpaa != "") %>% 
  mutate(rating = fct_rev(cut(rating, seq(0,ceiling(max(rating)),2)))) %>% 
  ggplot(aes(mpaa, fill=rating)) +
    geom_bar(colour="white", size=0.2) + 
    scale_fill_manual(values=c(hcl(240,100,c(30,70)), "yellow", hcl(0,100,c(70,30))))

enter image description here

Perhaps a boxplot or violin plot would be more informative. In the boxplot example below, the box widths are proportional to the square root of the number of movies rated, due to the varwidth=TRUE argument (I'm not that wild about this because the square-root transformation is difficult to interpret, but I thought I'd put it out there as an option). In the violin plot, the area of each violin is proportional to the number of movies in each mpaa category (due to the scale="count" argument). I've also put the number of movies in each category in the x-axis label, and marked in blue the mean rating for each mpaa category.

p = movies %>% 
  filter(mpaa != "") %>% 
  group_by(mpaa) %>% 
  mutate(xlab = paste0(mpaa, "\n(", format(n(), big.mark=","), ")")) %>% 
  ggplot(aes(xlab, rating)) +
    labs(x="MPAA Rating\n(number of movies)", 
         y="Viewer Rating") +
    scale_y_continuous(limits=c(0,10))

pl = list(geom_boxplot(varwidth=TRUE, colour="grey70"),
          geom_violin(colour="grey70", scale="count",
                      draw_quantiles=c(0.25,0.5,0.75)),
          stat_summary(fun.y=mean, geom="text", aes(label=sprintf("%1.1f", ..y..)), 
                         colour="blue", size=3.5))  

gridExtra::grid.arrange(p + pl[-2], p + pl[-1], ncol=2)

enter image description here

like image 156
eipi10 Avatar answered Feb 23 '26 07:02

eipi10