Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R - identify cols that contain any of a values set

Tags:

r

tidyverse

I have a dataframe like this

df <- data.frame(col1 = c(letters[1:4],"a"),col2 = 1:5,col3 = letters[10:14])
 df
  col1 col2 col3
1    a    1    j
2    b    2    k
3    c    3    l
4    d    4    m
5    a    5    n

I would like to identify the columns that contain any value from the following vector:

vals=c("a","b","n","w")

A tidy solution would be awesome!

like image 722
tzema Avatar asked Dec 29 '21 19:12

tzema


People also ask

How do I find columns with NA in R?

In R, the easiest way to find columns that contain missing values is by combining the power of the functions is.na() and colSums(). First, you check and count the number of NA's per column. Then, you use a function such as names() or colnames() to return the names of the columns with at least one missing value.

How do I select certain columns of data in R?

To select a column in R you can use brackets e.g., YourDataFrame['Column'] will take the column named “Column”. Furthermore, we can also use dplyr and the select() function to get columns by name or index. For instance, select(YourDataFrame, c('A', 'B') will take the columns named “A” and “B” from the dataframe.

How do I find rows with specific values in R?

You can use the following basic syntax to find the rows of a data frame in R in which a certain value appears in any of the columns: library(dplyr) df %>% filter_all(any_vars(. %in% c('value1', 'value2', ...)))

How do you check if a value is in a column in R?

The %in% in R is a built-in R operator that returns TRUE if an element belongs to a vector or data frame or FALSE otherwise. The %in% will check if two vectors contain overlapping numbers.


2 Answers

We may use select

library(dplyr)
df %>% 
   select(where(~ any(. %in% vals, na.rm = TRUE)))

-output

   col1 col3
1    a    j
2    b    k
3    c    l
4    d    m
5    a    n

A similar option in base R is with Filter

Filter(\(x)  any(x %in% vals, na.rm = TRUE), df)
  col1 col3
1    a    j
2    b    k
3    c    l
4    d    m
5    a    n
like image 121
akrun Avatar answered Oct 07 '22 23:10

akrun


Another tidyverse option is to use keep() from purrr.

library(purrr)

df %>% 
  keep( ~ any(.x %in% vals))
like image 3
Adam Avatar answered Oct 07 '22 22:10

Adam