How to find if ANY column has a specific value I am looking for?

Question

   id first  middle  last       Age
    1 Carol  Jenny   Smith      15
    2 Sarah  Carol   Roberts    20
    3 Josh   David   Richardson 22

I am trying find a specific name in ANY of the name columns (first, middle, last). For example, if I found anyone with a name Carol (doesn't matter if it's the first/middle/last name), I want to mutate a column 'Carol' and give 1. So what I want is the following

   id first  middle  last       Age  Carol
    1 Carol   Jenny   Smith      15   1
    2 Sarah  Carol   Roberts    20    1
    3 Josh   David   Richardson 22    0

I have been trying ifelse(c(first, middle,last) == "Carol" , 1, 0 ) or "Carol" %in% first...etc but for some reason I can only work on one column instead of multiple columns.. Could anyone help me please? Thank you in advance!

Ronak Shah · Accepted Answer

We can use rowSums

df$Carol <- as.integer(rowSums(df[2:4] == "Carol") > 0)

df
#  id first middle       last Age Carol
#1  1 Carol  Jenny      Smith  15     1
#2  2 Sarah  Carol    Roberts  20     1
#3  3  Josh  David Richardson  22     0

If we need it as a function

fun <- function(df, value) {
   as.integer(rowSums(df[2:4] == value) > 0)
}

fun(df, "Carol")
#[1] 1 1 0
fun(df, "Sarah")
#[1] 0 1 0

but this assumes the columns you want to search are at position 2:4.

To give more flexibility with column position

fun <- function(df, cols, value) {
   as.integer(rowSums(df[cols] == value) > 0)
 }
fun(df, c("first", "last","middle"), "Carol")
#[1] 1 1 0
fun(df, c("first", "last","middle"), "Sarah")
#[1] 0 1 0

eipi10 · Answer

Here's a tidyverse option. We first reshape the data to long format, group by id, and find levels of id that have the desired name in at least one row. Then we reshape back to wide format.

library(tidyverse)

df %>% 
  gather(key, value, first:last) %>% 
  group_by(id) %>% 
  mutate(Carol = as.numeric(any(value=="Carol"))) %>% 
  spread(key, value)

     id   Age Carol first last       middle
1     1    15     1 Carol Smith      Jenny 
2     2    20     1 Sarah Roberts    Carol 
3     3    22     0 Josh  Richardson David

Or, as a function:

find.target = function(data, target) {

  data %>% 
    gather(key, value, first:last) %>% 
    group_by(id) %>% 
    mutate(!!target := as.numeric(any(value==target))) %>% 
    spread(key, value) %>% 
    # Move new target column to end
    select(-target, target)

}

find.target(df, "Carol")
find.target(df, "Sarah")

You could also do several at once. For example:

map(c("Sarah", "Carol", "David"), ~ find.target(df, .x)) %>% 
  reduce(left_join)

     id   Age first last       middle Sarah Carol David
1     1    15 Carol Smith      Jenny      0     1     0
2     2    20 Sarah Roberts    Carol      1     1     0
3     3    22 Josh  Richardson David      0     0     1

akrun · Answer

Using tidyverse

library(tidyverse)
f1 <- function(data, wordToCompare, colsToCompare) {
          wordToCompare <- enquo(wordToCompare)
          data %>%
              select(colsToCompare) %>%
              mutate(!! wordToCompare :=  map(.,  ~ 
       .x == as_label(wordToCompare)) %>% 
           reduce(`|`) %>%
           as.integer)
              }
          
f1(df1, Carol, c("first", 'middle', 'last'))
# first middle       last Carol
#1 Carol  Jenny      Smith     1
#2 Sarah  Carol    Roberts     1
#3  Josh  David Richardson     0

f1(df1, Sarah, c("first", 'middle', 'last'))
#   first middle       last Sarah
#1 Carol  Jenny      Smith     0
#2 Sarah  Carol    Roberts     1
#3  Josh  David Richardson     0

Or this can also be done with pmap

df1 %>%
  mutate(Carol = pmap_int(.[c('first', 'middle', 'last')],
          ~ +('Carol' %in% c(...))))
#   id first middle       last Age Carol
#1  1 Carol  Jenny      Smith  15     1
#2  2 Sarah  Carol    Roberts  20     1
#3  3  Josh  David Richardson  22     0

which can be wrapped into a function

f2 <- function(data, wordToCompare, colsToCompare) {
      wordToCompare <- enquo(wordToCompare)
      data %>%
           mutate(!! wordToCompare := pmap_int(.[colsToCompare],
          ~ +(as_label(wordToCompare) %in% c(...))))
  } 

f2(df1, Carol, c("first", 'middle', 'last'))
#  id first middle       last Age Carol
#1  1 Carol  Jenny      Smith  15     1
#2  2 Sarah  Carol    Roberts  20     1
#3  3  Josh  David Richardson  22     0

NOTE: Both the tidyverse methods doesn't require any reshaping

With base R, we can loop through the 'first', 'middle', 'last' column and use == for comparison to get a list of logical vectors, which we Reduce to a single logical vector with | and coerce it to binary with +

df1$Carol <- +(Reduce(`|`, lapply(df1[2:4], `==`, 'Carol')))
df1
#  id first middle       last Age Carol
#1  1 Carol  Jenny      Smith  15     1
#2  2 Sarah  Carol    Roberts  20     1 
#3  3  Josh  David Richardson  22     0

NOTE: There are dupes for this post. For e.g. here

data

df1 <- structure(list(id = 1:3, first = c("Carol", "Sarah", "Josh"), 
middle = c("Jenny", "Carol", "David"), last = c("Smith", 
"Roberts", "Richardson"), Age = c(15L, 20L, 22L)),
  class = "data.frame", row.names = c(NA, 
 -3L))

LocoGris · Answer

A solution using apply family

df$Carol = lapply(1:nrow(df), function(x) any(df[x,]=="Carol))

How to find if ANY column has a specific value I am looking for?

Tags:

r

filter

dplyr

JNB

4 Answers

Ronak Shah

eipi10

data

akrun

LocoGris

Recent Activity

Donate For Us

How to find if ANY column has a specific value I am looking for?

Tags:

r

filter

dplyr

JNB

4 Answers

Ronak Shah

eipi10

data

akrun

LocoGris

Related questions

Recent Activity

Donate For Us