I am working with a large dataset where much of the data was entered twice. This means that many of the variables are represented by pairs of columns: column.1
with the data entered by one person, and column.2
where the same data was entered by a different person. I want to create a "master" column called simply column
that first draws from column.1
and then, if column.1
is NA
, draws from column.2
.
Here is an example of what I am trying to do with made-up data:
mydata <- data.frame(name = c("Sarah","Ella","Carmen","Dinah","Billie"),
cheese.1 = c(1,4,NA,6,NA),
cheese.2 = c(1,4,3,5,NA),
milk.1 = c(NA,2,0,4,NA),
milk.2 = c(1,2,1,4,2),
tofu.1 = c("yum","yum",NA,"gross", NA),
tofu.2 = c("gross", "yum", "yum", NA, "gross"))
For example, the code below shows an example of what I want to do for a single pair of columns.
mydata %>% mutate(cheese = ifelse(is.na(cheese.1), cheese.2, cheese.1))
#OUTPUT:
name cheese.1 cheese.2 milk.1 milk.2 tofu.1 tofu.2 cheese
1 Sarah 1 1 NA 1 yum gross 1
2 Ella 4 4 2 2 yum yum 4
3 Carmen NA 3 0 1 <NA> yum 3
4 Dinah 6 5 4 4 gross <NA> 6
5 Billie NA NA NA 2 <NA> gross NA
However, I want to automate the process rather than doing each manually. Below is my attempt at automating the process, using a list (col.list
) of the column pairs for which I want to create new "master" columns:
col.list = c("cheese","milk","tofu")
lapply(col.list, FUN = function(x) {
v <- as.name({{x}})
v.1 <- as.name(paste0({{x}}, ".1"))
v.2 <- as.name(paste0(({{x}}), ".2"))
mydata %>% mutate(v = ifelse(is.na({{v.1}}), {{v.2}}, {{v.1}}))
})
#OUTPUT:
[[1]]
name cheese.1 cheese.2 milk.1 milk.2 tofu.1 tofu.2 v
1 Sarah 1 1 NA 1 yum gross 1
2 Ella 4 4 2 2 yum yum 4
3 Carmen NA 3 0 1 <NA> yum 3
4 Dinah 6 5 4 4 gross <NA> 6
5 Billie NA NA NA 2 <NA> gross NA
[[2]]
name cheese.1 cheese.2 milk.1 milk.2 tofu.1 tofu.2 v
1 Sarah 1 1 NA 1 yum gross 1
2 Ella 4 4 2 2 yum yum 2
3 Carmen NA 3 0 1 <NA> yum 0
4 Dinah 6 5 4 4 gross <NA> 4
5 Billie NA NA NA 2 <NA> gross 2
[[3]]
name cheese.1 cheese.2 milk.1 milk.2 tofu.1 tofu.2 v
1 Sarah 1 1 NA 1 yum gross yum
2 Ella 4 4 2 2 yum yum yum
3 Carmen NA 3 0 1 <NA> yum yum
4 Dinah 6 5 4 4 gross <NA> gross
5 Billie NA NA NA 2 <NA> gross gross
The problems with this attempt are:
cheese
, milk
and tofu
rather than all be called v
)col.list
).(1) You have to wrap v
into the curly-curly operator and use :=
:
library(dplyr)
col.list <- c("cheese","milk","tofu")
lapply(col.list, FUN = function(x) {
v <- as.name({{x}})
v.1 <- as.name(paste0({{x}}, ".1"))
v.2 <- as.name(paste0(({{x}}), ".2"))
mydata %>% mutate({{ v }} = ifelse(is.na({{v.1}}), {{v.2}}, {{v.1}}))
})
returns
[[1]]
name cheese.1 cheese.2 milk.1 milk.2 tofu.1 tofu.2 cheese
1 Sarah 1 1 NA 1 yum gross 1
2 Ella 4 4 2 2 yum yum 4
3 Carmen NA 3 0 1 <NA> yum 3
4 Dinah 6 5 4 4 gross <NA> 6
5 Billie NA NA NA 2 <NA> gross NA
[...]
which is one step closer to your desired output.
(2) But to get your desired output, I suggest using purrr
:
library(purrr)
library(dplyr)
col.list %>%
map(~mydata %>%
select(name, starts_with(.x)) %>%
mutate({{ .x }} := ifelse(
is.na(!!sym(paste0(.x, ".1"))),
!!sym(paste0(.x, ".2")),
!!sym(paste0(.x, ".1"))
)
)
) %>%
reduce(left_join, by = "name")
This returns
name cheese.1 cheese.2 cheese milk.1 milk.2 milk tofu.1 tofu.2 tofu
1 Sarah 1 1 1 NA 1 1 yum gross yum
2 Ella 4 4 4 2 2 2 yum yum yum
3 Carmen NA 3 3 0 1 0 <NA> yum yum
4 Dinah 6 5 6 4 4 4 gross <NA> gross
5 Billie NA NA NA NA 2 2 <NA> gross gross
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With