I have a dataframe like this
df <- data.frame(col1 = c(letters[1:4],"a"),col2 = 1:5,col3 = letters[10:14])
df
col1 col2 col3
1 a 1 j
2 b 2 k
3 c 3 l
4 d 4 m
5 a 5 n
I would like to identify the columns that contain any value from the following vector:
vals=c("a","b","n","w")
A tidy solution would be awesome!
In R, the easiest way to find columns that contain missing values is by combining the power of the functions is.na() and colSums(). First, you check and count the number of NA's per column. Then, you use a function such as names() or colnames() to return the names of the columns with at least one missing value.
To select a column in R you can use brackets e.g., YourDataFrame['Column'] will take the column named “Column”. Furthermore, we can also use dplyr and the select() function to get columns by name or index. For instance, select(YourDataFrame, c('A', 'B') will take the columns named “A” and “B” from the dataframe.
You can use the following basic syntax to find the rows of a data frame in R in which a certain value appears in any of the columns: library(dplyr) df %>% filter_all(any_vars(. %in% c('value1', 'value2', ...)))
The %in% in R is a built-in R operator that returns TRUE if an element belongs to a vector or data frame or FALSE otherwise. The %in% will check if two vectors contain overlapping numbers.
We may use select
library(dplyr)
df %>%
select(where(~ any(. %in% vals, na.rm = TRUE)))
-output
col1 col3
1 a j
2 b k
3 c l
4 d m
5 a n
A similar option in base R
is with Filter
Filter(\(x) any(x %in% vals, na.rm = TRUE), df)
col1 col3
1 a j
2 b k
3 c l
4 d m
5 a n
Another tidyverse option is to use keep()
from purrr
.
library(purrr)
df %>%
keep( ~ any(.x %in% vals))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With