Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Creating one variable from a list of variables in R?

Tags:

r

dplyr

tidyverse

I have a sequence of variables in a dataframe (over 100) and I would like to create an indicator variable for if particular text patterns are present in any of the variables. Below is an example with three variables. One solution I've found is using tidyr::unite() followed by dplyr::mutate(), but I'm interested in a solution where I do not have to unite the variables.

c1<-c("T1", "X1", "T6", "R5")
c2<-c("R4", "C6", "C7", "X3")
c3<-c("C5", "C2", "X4", "T2")

df<-data.frame(c1, c2, c3)

  c1 c2 c3
1 T1 R4 C5
2 X1 C6 C2
3 T6 C7 X4
4 R5 X3 T2

code.vec<-c("T1", "T2", "T3", "T4") #Text patterns of interest
code_regex<-paste(code.vec, collapse="|")

new<-df %>% 
  unite(all_c, c1:c3, remove=FALSE) %>% 
  mutate(indicator=if_else(grepl(code_regex, all_c), 1, 0)) %>% 
  select(-(all_c))

  c1 c2 c3 indicator
1 T1 R4 C5 1
2 X1 C6 C2 0
3 T6 C7 X4 0
4 R5 X3 T2 1

Above is an example that produces the desired result, however I feel as if there should be a way of doing this in tidyverse without having to unite the variables. This is something that SAS handles very easily using an ARRAY statement and a DO loop, and I'm hoping R has a good way of handling this.

The real dataframe has many additional variables besides from the "c" fields to search, so a solution that involves searching every column would require subsetting the dataframe to first only contain the variables I want to search, and then joining the data back with the other variables.

like image 427
patward5656 Avatar asked Apr 22 '19 14:04

patward5656


People also ask

How do I combine variable values in R?

Merging datasets You can merge columns, by adding new variables; or you can merge rows, by adding observations. To add columns use the function merge() which requires that datasets you will merge to have a common variable. In case that datasets doesn't have a common variable use the function cbind .

How do I create a variable list in R?

How to Create Lists in R? We can use the list() function to create a list. Another way to create a list is to use the c() function. The c() function coerces elements into the same type, so, if there is a list amongst the elements, then all elements are turned into components of a list.

How do you create a list of variables in Python?

In Python, a list is created by placing elements inside square brackets [] , separated by commas. A list can have any number of items and they may be of different types (integer, float, string, etc.).


Video Answer


2 Answers

Using base R, we can use sapply and use grepl to find pattern in every column and assign 1 to rows where there is more than 0 matches.

df$indicator <- as.integer(rowSums(sapply(df, grepl, pattern = code_regex)) > 0)

df
#  c1 c2 c3 indicator
#1 T1 R4 C5         1
#2 X1 C6 C2         0
#3 T6 C7 X4         0
#4 R5 X3 T2         1

If there are few other columns and we are interested to apply it only for columns which start with "c" we can use grep to filter them.

cols <- grep("^c", names(df))
as.integer(rowSums(sapply(df[cols], grepl, pattern = code_regex)) > 0)

Using dplyr we can do

library(dplyr)

df$indicator <- as.integer(df %>%
              mutate_at(vars(c1:c3), ~grepl(code_regex, .)) %>%
              rowSums() > 0)
like image 139
Ronak Shah Avatar answered Oct 16 '22 09:10

Ronak Shah


We can use tidyverse

library(tidyverse)
df %>%
    mutate_all(str_detect, pattern = code_regex) %>%
    reduce(`+`) %>% 
    mutate(df, indicator = .)
#  c1 c2 c3 indicator
#1 T1 R4 C5         1
#2 X1 C6 C2         0
#3 T6 C7 X4         0
#4 R5 X3 T2         1

Or using base R

Reduce(`+`, lapply(df, grepl, pattern = code_regex))
#[1] 1 0 0 1
like image 21
akrun Avatar answered Oct 16 '22 10:10

akrun