I have a tibble with information about diagnoses:
data <- tibble(
id = c(1:10),
diagnosis_1 = c("F32", "F431", "R58", "S32", "F11", NA, NA, "Y67", "F32", "Z032"),
diagnosis_2 = c(NA, NA, NA, NA, NA, NA, "G35", NA, NA, NA),
diagnosis_3 = c("F40", NA, "R67", "F431", NA, "F60", "S58", "R68", "F11", NA),
diagnosis_4 = c(NA, NA, "F65", NA, "F19", NA, NA, "F32", NA, NA)
)
As a part of the cleaning process, I have removed all diagnoses not fulfilling certain criteria (i.e. not starting with the letter F, G, or Z). With the following code:
data$diagnosis_1[str_sub(data$diagnosis_1, 1,1) %in% c("R", "S", "Y")] <- NA
data$diagnosis_2[str_sub(data$diagnosis_2, 1,1) %in% c("R", "S", "Y")] <- NA
data$diagnosis_3[str_sub(data$diagnosis_3, 1,1) %in% c("R", "S", "Y")] <- NA
data$diagnosis_4[str_sub(data$diagnosis_4, 1,1) %in% c("R", "S", "Y")] <- NA
Ending up with this tibble:
I now need to move the data to the left to fill the columns from left to right (i.e diagnosis_1 not being empty if diagnosis_2, diagnosis_3 or diagnosis_4 has data). I have tried using ifelse() as it is vectorized but I can`t seem to get it to work with several nested ifelse().
ifelse(is.na(data$diagnosis_1), data$diagnosis_2, data$diagnosis_1))
All suggestions are much appreciated.
Edit: adding expected output:
Using dplyr and tidyr. Reshape from wide to long, exclude "^RSY"
and NA
diagnosis, reshape long to wide.
library(dplyr)
library(tidyr)
gather(data, key = "k", value = "v", -id) %>%
filter(!(grepl("^[R|S|Y]", v) | is.na(v))) %>%
group_by(id) %>%
mutate(diagN = paste0("diagnosis_", row_number())) %>%
select(-k) %>%
spread(key = "diagN", value = "v") %>%
ungroup()
# # A tibble: 10 x 3
# id diagnosis_1 diagnosis_2
# <int> <chr> <chr>
# 1 1 F32 F40
# 2 2 F431 NA
# 3 3 F65 NA
# 4 4 F431 NA
# 5 5 F11 F19
# 6 6 F60 NA
# 7 7 G35 NA
# 8 8 F32 NA
# 9 9 F32 F11
# 10 10 Z032 NA
We first replace
values which start with either "R", "S" or "Y" to NA
and then left shift the non-NA values.
data[-1] <- lapply(data[-1], function(x) replace(x, grepl("^[R|S|Y]", x), NA))
data[] <- t(apply(data, 1, function(x) `length<-`(na.omit(x), length(x))))
data
# A tibble: 10 x 5
# id diagnosis_1 diagnosis_2 diagnosis_3 diagnosis_4
# <chr> <chr> <chr> <chr> <chr>
# 1 " 1" F32 F40 NA NA
# 2 " 2" F431 NA NA NA
# 3 " 3" F65 NA NA NA
# 4 " 4" F431 NA NA NA
# 5 " 5" F11 F19 NA NA
# 6 " 6" F60 NA NA NA
# 7 " 7" G35 NA NA NA
# 8 " 8" F32 NA NA NA
# 9 " 9" F32 F11 NA NA
#10 10 Z032 NA NA NA
Shifting the non-NA value to left has been taken from David's answer from here. You can try any other approach to shift values from the same question as well.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With