Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove unused factor levels from a ggplot bar plot

I want to do the opposite of this question, and sort of the opposite of this question, though that's about legends, not the plot itself.

The other SO questions seem to be asking about how to keep unused factor levels. I'd actually like mine removed. I have several name variables and several columns (wide format) of variable attributes that I'm using to create numerous bar plots. Here's a reproducible example:

library(ggplot2)
df <- data.frame(name=c("A","B","C"), var1=c(1,NA,2),var2=c(3,4,5))
ggplot(df, aes(x=name,y=var1)) + geom_bar()

I get this:

enter image description here

I'd like only the names that have corresponding varn's show up in my bar plot (as in, there would be no empty space for B).

Reusing the base plot code will be quite easy if I can simply change my output file name and y=var bit. I'd like not have to subset my data frame just to use droplevels on the result for each plot if possible!


Update based on the na.omit() suggestion

Consider a revised data set:

library(ggplot2)
df <- data.frame(name=c("A","B","C"), var1=c(1,NA,2),var2=c(3,4,5), var3=c(NA,6,7))
ggplot(df, aes(x=name,y=var1)) + geom_bar()

I need to use na.omit() for plotting var1 because there's an NA present. But since na.omit makes sure values are present for all columns, the plot removes A as well since it has an NA in var3. This is more analogous to my data. I have 15 total responses with NAs peppered about. I only want to remove factor levels that don't have values for the current plotted y vector, not that have NAs in any vector in the whole data frame.

like image 464
Hendy Avatar asked Jul 09 '12 21:07

Hendy


3 Answers

One easy options is to use na.omit() on your data frame df to remove those rows with NA

ggplot(na.omit(df), aes(x=name,y=var1)) + geom_bar()

Given your update, the following

ggplot(df[!is.na(df$var1), ], aes(x=name,y=var1)) + geom_bar()

works OK and only considers NA in Var1. Given that you are only plotting name and Var, apply na.omit() to a data frame containing only those variables

ggplot(na.omit(df[, c("name", "var1")]), aes(x=name,y=var1)) + geom_bar()
like image 179
Gavin Simpson Avatar answered Oct 20 '22 08:10

Gavin Simpson


Notice that, when plotting, you're using only two columns of your data frame, meaning that, rather than passing your whole data.frame you could take the relevant columns x[,c("name", "var1")] apply na.omit to remove the unwanted rows (as Gavin Simpson suggests) na.omit(x[,c("name", "var1")]) and then plot this data.

My R/ggplot is quite rusty, and I realise that there are probably cleaner ways to achieve this.

like image 41
Tilo Wiklund Avatar answered Oct 20 '22 10:10

Tilo Wiklund


A lot of time has passed since this question was originally asked. In 2021 if I was handling this I would use something like:

library(ggplot2)
library(tidyr)
df <- data.frame(name=c("A","B","C"), var1=c(1,NA,2),var2=c(3,4,5))

df %>% 
  drop_na(var1) %>% 
  ggplot(aes(name, var1)) +
  geom_col()

Created on 2021-12-03 by the reprex package (v2.0.1)

like image 1
John-Henry Avatar answered Oct 20 '22 10:10

John-Henry