Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I create a new column based on multiple conditions from multiple columns?

I'm trying add a new column to a data frame based on several conditions from other columns. I have the following data:

> commute <- c("walk", "bike", "subway", "drive", "ferry", "walk", "bike", "subway", "drive", "ferry", "walk", "bike", "subway", "drive", "ferry")
> kids <- c("Yes", "Yes", "No", "No", "Yes", "Yes", "No", "No", "Yes", "Yes", "No", "No", "Yes", "No", "Yes")
> distance <- c(1, 12, 5, 25, 7, 2, "", 8, 19, 7, "", 4, 16, 12, 7)
> 
> df = data.frame(commute, kids, distance)
> df
   commute kids distance
1     walk  Yes        1
2     bike  Yes       12
3   subway   No        5
4    drive   No       25
5    ferry  Yes        7
6     walk  Yes        2
7     bike   No         
8   subway   No        8
9    drive  Yes       19
10   ferry  Yes        7
11    walk   No         
12    bike   No        4
13  subway  Yes       16
14   drive   No       12
15   ferry  Yes        7

If the following three conditions are met:

commute = walk OR bike OR subway OR ferry
AND
kids = Yes
AND
distance is less than 10

Then I'd like a new column called get.flyer to equal "Yes". The final data frame should look like this:

   commute kids distance get.flyer
1     walk  Yes        1       Yes
2     bike  Yes       12       Yes
3   subway   No        5          
4    drive   No       25          
5    ferry  Yes        7       Yes
6     walk  Yes        2       Yes
7     bike   No                   
8   subway   No        8          
9    drive  Yes       19          
10   ferry  Yes        7       Yes
11    walk   No                   
12    bike   No        4          
13  subway  Yes       16       Yes
14   drive   No       12          
15   ferry  Yes        7       Yes
like image 898
Ankie Avatar asked Sep 09 '16 07:09

Ankie


1 Answers

Example, check if first_column_name is contained in second_column_name and write result to new_column

df$new_column <- apply(df, 1, function(x) grepl(x['first_column_name'], x['second_column_name'], fixed = TRUE))

Details:

df$new_column <- # create a new column with name new_column on df
apply(df, 1 # `1` means for each row, `apply(df` means apply the following function on df
function(x) # Function definition to apply on each row, `x` means input row for each row.
grepl(x['first_column_name'], x['second_column_name'], fixed = TRUE)) # Body of function to apply, basically run grepl to find if first_column_name is in second_column_name, fixed = TRUE means don't use regular expression just the plain text from first_column_name.
like image 143
Tomer Ben David Avatar answered Sep 24 '22 23:09

Tomer Ben David