I have a dataset that looks like this:
before = data.frame(diag1 = c(1,NA, 1, NA, NA, 1), diag2 = c(NA, NA, NA, 2, NA, NA), diag3 = c(3, NA, NA, NA, 3, 3), diag4 = c(4, 4, NA, NA, 4, NA))
diag1 diag2 diag3 diag4
1 1 NA 3 4
2 NA NA NA 4
3 1 NA NA NA
4 NA 2 NA NA
5 NA NA 3 4
6 1 NA 3 NA
I have been trying to find a solution in which the end result is a new column named "diagnoses" that looks like this
diagnoses
1 1,3,4
2 4
3 1
4 2
5 3,4
6 1,3
This is just a much smaller example of my real problem. In the dataset I am working on there are over 70 columns of diagnoses, with no more than 3 numeric values in each row. I have tried strsplit, separate, unite functions. I still haven't found an elegant solution
I have used apply paste function
dat$diagnoses<- apply( (dat[ , cols]), 1, function(x) paste(na.omit(x),collapse=", ") )
However, it yields a string with many commas.
I tried gsub to substitute the , but I still have not been able to get the results I hoped.
This is the output I have been able to get: "1,,3,4,," ",,,4,," " 1,,,,," ",2,,,," ",,3,4,," "1,,3,,,"
An option is to loop through the row with apply
, remove the NA
elements and paste
it together
before$new <- apply(before, 1, function(x) toString(x[!is.na(x)]))
before$new
#[1] "1, 3, 4" "4" "1" "2" "3, 4" "1, 3"
Another possibility could be:
before$rowid <- 1:nrow(before)
aggregate(values ~ rowid,
paste0, collapse = ",",
data.frame(before[5], stack(before[-5])))
rowid values
1 1 1,3,4
2 2 4
3 3 1
4 4 2
5 5 3,4
6 6 1,3
foo = function(..., sep = ","){
paste(..., sep = sep)
}
gsub(",?NA|NA,?", "", do.call(foo, before))
#[1] "1,3,4" "4" "1" "2" "3,4" "1,3"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With