Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Enter value from df based on condition across multiple columns into new variable

Tags:

r

dplyr

I am sure I am not the only person who has asked this but after hours of searching with no luck I need to ask the question myself.

I have a df (rp) like so:

rp <- structure(list(agec1 = c(7, 16, 11, 11, 17, 17), 
               agec2 = c(6, 12, 9, 9, 16, 15), 
               agec3 = c(2, 9, 9, 9, 14, NA), 
               agec4 = c(NA, 7, 9, 9, 13, NA), 
               agec5 = c(NA, 4, 7, 7, 10, NA), 
               agec6 = c(NA, NA, 6, 6, 9, NA), 
               agec7 = c(NA, NA, NA, NA, 7, NA), 
               agec8 = c(NA, NA, NA, NA, 5, NA), 
          row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"))

Where each obs in agecX refers to the age of a parent's children up to 8 children. I would like to create a new column "agec5_12" that contains the age of the oldest child aged 5-12. So my df would look like this:

rpage <- structure(list(agec1 = c(7, 16, 11, 11, 17, 17), 
               agec2 = c(6, 12, 9, 9, 16, 15), 
               agec3 = c(2, 9, 9, 9, 14, NA), 
               agec4 = c(NA, 7, 9, 9, 13, NA), 
               agec5 = c(NA, 4, 7, 7, 10, NA), 
               agec6 = c(NA, NA, 6, 6, 9, NA), 
               agec7 = c(NA, NA, NA, NA, 7, NA), 
               agec8 = c(NA, NA, NA, NA, 5, NA), 
               agec5_12 = c(7, 12, 11, 11, 10, NA))
          row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"))

Notes about my data:

  • Ages are not always in the same chronological order i.e. youngest to oldest or oldest to youngest
  • It is possible for a row to have no children aged within this range (in which case I would like NA to be returned)

I have tried writing a function and applying it using rowwise and mutate:

fun.age5_12 <- function(x){
                 x[which(x == max(x[(x > 4) & (x < 13)], na.rm = TRUE))]
                }
rpage <- rp %>%
         select(-c(20:21, 199:200)) %>%
         rowwise() %>% 
         mutate(agec5_12 = fun.age5_12(c(1:8)))

However, this returns all obs as "12". Ideally I would like to do this using dplyr. Any suggestions using mutate or ifelse and not necessarily with functions are fine.

Thank you

like image 480
Charlotte Jelleyman Avatar asked Nov 16 '18 02:11

Charlotte Jelleyman


1 Answers

I know you wanted tidyverse but here's one base R way:

data.frame(
  agec1 = c(7, 16, 11, 11, 17, 17), 
  agec2 = c(6, 12, 9, 9, 16, 15), 
  agec3 = c(2, 9, 9, 9, 14, NA), 
  agec4 = c(NA, 7, 9, 9, 13, NA), 
  agec5 = c(NA, 4, 7, 7, 10, NA), 
  agec6 = c(NA, NA, 6, 6, 9, NA), 
  agec7 = c(NA, NA, NA, NA, 7, NA), 
  agec8 = c(NA, NA, NA, NA, 5, NA), 
  stringsAsFactors = FALSE
) -> rp

for (i in 1:nrow(rp)) {
  agec5_12 <- unlist(rp[i,], use.names = FALSE)
  agec5_12 <- agec5_12[agec5_12 >= 5 & agec5_12 <= 12 & !is.na(agec5_12)]
  rp[i, "agec5_12"] <- if (length(agec5_12)) max(agec5_12) else NA_integer_
}

rp
##   agec1 agec2 agec3 agec4 agec5 agec6 agec7 agec8 agec5_12
## 1     7     6     2    NA    NA    NA    NA    NA        7
## 2    16    12     9     7     4    NA    NA    NA       12
## 3    11     9     9     9     7     6    NA    NA       11
## 4    11     9     9     9     7     6    NA    NA       11
## 5    17    16    14    13    10     9     7     5       10
## 6    17    15    NA    NA    NA    NA    NA    NA       NA

The for shows the idiom but an sapply() solution is alot faster:

rp1$agec5_12 <- sapply(1:nrow(rp), function(i) {
  agec5_12 <- unlist(rp[i,], use.names = FALSE)
  agec5_12 <- agec5_12[agec5_12 >= 5 & agec5_12 <= 12 & !is.na(agec5_12)]
  if (length(agec5_12)) max(agec5_12) else NA_integer_
})
like image 71
hrbrmstr Avatar answered Sep 20 '22 23:09

hrbrmstr