Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ggplot: remove NA factor level in legend

Tags:

r

ggplot2

How can I omit the NA level of a factor from a legend?

Pesky NA legend value.....

From the nycflights13 database, I created a new continuous variable called tot_delay, and then created a factor called delay_class with 4 levels. When I plot, I filter out NA values, but they still appear in the legend. Here's my code:

library(nycflights13); library(ggplot2)

flights$tot_delay = flights$dep_delay + flights$arr_delay
flights$delay_class <- cut(flights$tot_delay,                                   
                           c(min(flights$tot_delay, na.rm = TRUE), 0, 20 , 120,
                             max(flights$tot_delay, na.rm = TRUE)),   
                           labels = c("none", "short","medium","long"))     

filter(flights, !is.na(tot_delay)) %>% 
  ggplot() +
  geom_bar(mapping = aes(x = carrier, fill = delay_class), position = "fill")
like image 854
Rich Pauloo Avatar asked Aug 03 '17 19:08

Rich Pauloo


People also ask

How do I remove a legend from a plot in ggplot2?

You can use the following syntax to remove a legend from a plot in ggplot2: ggplot (df, aes(x=x, y=y, color=z)) + geom_point () + theme (legend.position="none") By specifying legend.position=”none” you’re telling ggplot2 to remove all legends from the plot. The following step-by-step example shows how to use this syntax in practice.

How to create grouped boxplots with Custom Legend labels in ggplot?

#create grouped boxplots with custom legend labels p <- ggplot (data, aes(x=team, y=values, fill=program)) + geom_boxplot () + scale_fill_discrete (labels=c ('High Program', 'Low Program')) #display grouped boxplots p The legend now displays the labels that we specified.

How difficult is it to learn ggplot2?

The ggplot2 environment is very complex and might sometimes be difficult to understand. For that reason I can recommend to learn ggplot from scratch once, instead of struggling with it again and again. If you want to improve your ggplot2 skills, I can recommend the following YouTube video of the Data Science Dojo channel:

How does the data look in ggplot2?

Our example data contains four columns, the first column contains the x-values; the second column contains the y-values; the third column contains the colors for the dots of our plot; and the fourth column contains the filling color for a regression line that we will draw to our graphic. So let’s see how this data looks in ggplot2 by default.


3 Answers

The parent example isn't a good illustration of the problem (of course unexpected NA values should be tracked down and eliminated), but this is the top result on Google so it should be noted that there is a now an option in scale_XXX_XXX to prevent NA levels from displaying in the legend by setting na.translate = F. For example:

# default    
ggplot(data = data.frame(x = c(1,2,NA), y = c(1,1,NA), a = c("A","B",NA)),
           aes(x, y, colour = a)) + geom_point(size = 4)

enter image description here

# with na.translate = F    
ggplot(data = data.frame(x = c(1,2,NA), y = c(1,1,NA), a = c("A","B",NA)),
           aes(x, y, colour = a)) + geom_point(size = 4) + 
           scale_colour_discrete(na.translate = F)

enter image description here

This works in ggplot2 3.1.0.

like image 195
gatsky Avatar answered Oct 16 '22 07:10

gatsky


You have one data point where delay_class is NA, but tot_delay isn't. This point is not being caught by your filter. Changing your code to:

filter(flights, !is.na(delay_class)) %>% 
  ggplot() +
  geom_bar(mapping = aes(x = carrier, fill = delay_class), position = "fill")

does the trick:

enter image description here

Alternatively, if you absolutely must have that extra point, you can override the fill legend as follows:

filter(flights, !is.na(tot_delay)) %>% 
  ggplot() +
  geom_bar(mapping = aes(x = carrier, fill = delay_class), position = "fill") +
  scale_fill_manual( breaks = c("none","short","medium","long"),
                    values = scales::hue_pal()(4) )

UPDATE: As pointed out in @gatsky's answer, all discrete scales also include the na.translate argument. The feature actually existed since ggplot 2.2.0; I just wasn't aware of it at the time I posted my answer. For completeness, its usage in the original question would look like

filter(flights, !is.na(tot_delay)) %>% 
  ggplot() +
  geom_bar(mapping = aes(x = carrier, fill = delay_class), position = "fill") +
  scale_fill_discrete(na.translate=FALSE)
like image 22
Artem Sokolov Avatar answered Oct 16 '22 09:10

Artem Sokolov


I like @Artem's method above, i.e., getting to the bottom of why there are NA's in your df. However, sometimes you know there are NA's, and you just want to exclude them. In that case, simply using 'na.omit' should work:

na.omit(flights) %>% ggplot() +
geom_bar(mapping = aes(x = carrier, fill = delay_class), position = "fill")
like image 1
Woodstock Avatar answered Oct 16 '22 09:10

Woodstock