Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Moving data from right to left column in a tibble

Tags:

r

dplyr

stringr

I have a tibble with information about diagnoses:

data <- tibble(
  id = c(1:10),
  diagnosis_1 = c("F32", "F431", "R58", "S32", "F11", NA, NA, "Y67", "F32", "Z032"),
  diagnosis_2 = c(NA, NA, NA, NA, NA, NA, "G35", NA, NA, NA),
  diagnosis_3 = c("F40", NA, "R67", "F431", NA, "F60", "S58", "R68", "F11", NA),
  diagnosis_4 = c(NA, NA, "F65", NA, "F19", NA, NA, "F32", NA, NA)
)

As a part of the cleaning process, I have removed all diagnoses not fulfilling certain criteria (i.e. not starting with the letter F, G, or Z). With the following code:

data$diagnosis_1[str_sub(data$diagnosis_1, 1,1) %in% c("R", "S", "Y")] <- NA
data$diagnosis_2[str_sub(data$diagnosis_2, 1,1) %in% c("R", "S", "Y")] <- NA
data$diagnosis_3[str_sub(data$diagnosis_3, 1,1) %in% c("R", "S", "Y")] <- NA
data$diagnosis_4[str_sub(data$diagnosis_4, 1,1) %in% c("R", "S", "Y")] <- NA

Ending up with this tibble:

enter image description here

I now need to move the data to the left to fill the columns from left to right (i.e diagnosis_1 not being empty if diagnosis_2, diagnosis_3 or diagnosis_4 has data). I have tried using ifelse() as it is vectorized but I can`t seem to get it to work with several nested ifelse().

ifelse(is.na(data$diagnosis_1), data$diagnosis_2, data$diagnosis_1))

All suggestions are much appreciated.

Edit: adding expected output:

enter image description here

like image 464
MartinM Avatar asked Jan 28 '23 19:01

MartinM


2 Answers

Using dplyr and tidyr. Reshape from wide to long, exclude "^RSY" and NA diagnosis, reshape long to wide.

library(dplyr)
library(tidyr)

gather(data, key = "k", value = "v", -id) %>% 
  filter(!(grepl("^[R|S|Y]", v) | is.na(v))) %>% 
  group_by(id) %>% 
  mutate(diagN = paste0("diagnosis_", row_number())) %>% 
  select(-k) %>% 
  spread(key = "diagN", value = "v") %>% 
  ungroup()

# # A tibble: 10 x 3
#       id diagnosis_1 diagnosis_2
#    <int> <chr>       <chr>      
#  1     1 F32         F40        
#  2     2 F431        NA         
#  3     3 F65         NA         
#  4     4 F431        NA         
#  5     5 F11         F19        
#  6     6 F60         NA         
#  7     7 G35         NA         
#  8     8 F32         NA         
#  9     9 F32         F11        
# 10    10 Z032        NA  
like image 160
zx8754 Avatar answered Feb 11 '23 17:02

zx8754


We first replace values which start with either "R", "S" or "Y" to NA and then left shift the non-NA values.

data[-1] <- lapply(data[-1], function(x) replace(x, grepl("^[R|S|Y]", x), NA))  
data[] <- t(apply(data, 1, function(x) `length<-`(na.omit(x), length(x))))

data
# A tibble: 10 x 5
#     id diagnosis_1 diagnosis_2 diagnosis_3 diagnosis_4
#   <chr> <chr>       <chr>       <chr>       <chr>      
# 1 " 1"  F32         F40         NA          NA         
# 2 " 2"  F431        NA          NA          NA         
# 3 " 3"  F65         NA          NA          NA         
# 4 " 4"  F431        NA          NA          NA         
# 5 " 5"  F11         F19         NA          NA         
# 6 " 6"  F60         NA          NA          NA         
# 7 " 7"  G35         NA          NA          NA         
# 8 " 8"  F32         NA          NA          NA         
# 9 " 9"  F32         F11         NA          NA         
#10  10   Z032        NA          NA          NA    

Shifting the non-NA value to left has been taken from David's answer from here. You can try any other approach to shift values from the same question as well.

like image 26
Ronak Shah Avatar answered Feb 11 '23 19:02

Ronak Shah