I want to do the opposite of this question, and sort of the opposite of this question, though that's about legends, not the plot itself.
The other SO questions seem to be asking about how to keep unused factor levels. I'd actually like mine removed. I have several name variables and several columns (wide format) of variable attributes that I'm using to create numerous bar plots. Here's a reproducible example:
library(ggplot2)
df <- data.frame(name=c("A","B","C"), var1=c(1,NA,2),var2=c(3,4,5))
ggplot(df, aes(x=name,y=var1)) + geom_bar()
I get this:
I'd like only the names that have corresponding varn's show up in my bar plot (as in, there would be no empty space for B).
Reusing the base plot code will be quite easy if I can simply change my output file name and y=var
bit. I'd like not have to subset my data frame just to use droplevels on the result for each plot if possible!
Update based on the na.omit()
suggestion
Consider a revised data set:
library(ggplot2)
df <- data.frame(name=c("A","B","C"), var1=c(1,NA,2),var2=c(3,4,5), var3=c(NA,6,7))
ggplot(df, aes(x=name,y=var1)) + geom_bar()
I need to use na.omit()
for plotting var1
because there's an NA present. But since na.omit makes sure values are present for all columns, the plot removes A
as well since it has an NA in var3
. This is more analogous to my data. I have 15 total responses with NAs peppered about. I only want to remove factor levels that don't have values for the current plotted y vector, not that have NAs in any vector in the whole data frame.
One easy options is to use na.omit()
on your data frame df
to remove those rows with NA
ggplot(na.omit(df), aes(x=name,y=var1)) + geom_bar()
Given your update, the following
ggplot(df[!is.na(df$var1), ], aes(x=name,y=var1)) + geom_bar()
works OK and only considers NA
in Var1
. Given that you are only plotting name
and Var
, apply na.omit()
to a data frame containing only those variables
ggplot(na.omit(df[, c("name", "var1")]), aes(x=name,y=var1)) + geom_bar()
Notice that, when plotting, you're using only two columns of your data frame, meaning that, rather than passing your whole data.frame you could take the relevant columns x[,c("name", "var1")]
apply na.omit to remove the unwanted rows (as Gavin Simpson suggests) na.omit(x[,c("name", "var1")])
and then plot this data.
My R/ggplot is quite rusty, and I realise that there are probably cleaner ways to achieve this.
A lot of time has passed since this question was originally asked. In 2021 if I was handling this I would use something like:
library(ggplot2)
library(tidyr)
df <- data.frame(name=c("A","B","C"), var1=c(1,NA,2),var2=c(3,4,5))
df %>%
drop_na(var1) %>%
ggplot(aes(name, var1)) +
geom_col()
Created on 2021-12-03 by the reprex package (v2.0.1)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With