Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there an R function to extract only numbers from a comma-separated string with many NA values to create a column with only the numbers?

Tags:

string

r

numeric

na

I have a dataset that looks like this:

 before = data.frame(diag1 = c(1,NA, 1, NA, NA, 1), diag2 = c(NA, NA, NA, 2, NA, NA), diag3 = c(3, NA, NA, NA, 3, 3), diag4 = c(4, 4, NA, NA, 4, NA))

  diag1 diag2 diag3 diag4
1     1    NA     3     4
2    NA    NA    NA     4
3     1    NA    NA    NA
4    NA     2    NA    NA
5    NA    NA     3     4
6     1    NA     3    NA

I have been trying to find a solution in which the end result is a new column named "diagnoses" that looks like this

  diagnoses
1     1,3,4
2         4
3         1
4         2
5       3,4
6       1,3

This is just a much smaller example of my real problem. In the dataset I am working on there are over 70 columns of diagnoses, with no more than 3 numeric values in each row. I have tried strsplit, separate, unite functions. I still haven't found an elegant solution

I have used apply paste function

dat$diagnoses<- apply( (dat[ , cols]), 1, function(x) paste(na.omit(x),collapse=", ") )

However, it yields a string with many commas.

I tried gsub to substitute the , but I still have not been able to get the results I hoped.

This is the output I have been able to get: "1,,3,4,," ",,,4,," " 1,,,,," ",2,,,," ",,3,4,," "1,,3,,,"

like image 523
carol Avatar asked Sep 11 '19 17:09

carol


3 Answers

An option is to loop through the row with apply, remove the NA elements and paste it together

before$new <- apply(before, 1, function(x) toString(x[!is.na(x)]))
before$new
#[1] "1, 3, 4" "4"       "1"       "2"       "3, 4"    "1, 3"   
like image 96
akrun Avatar answered Nov 14 '22 23:11

akrun


Another possibility could be:

before$rowid <- 1:nrow(before)
aggregate(values ~ rowid, 
          paste0, collapse = ",",
          data.frame(before[5], stack(before[-5])))

  rowid values
1     1  1,3,4
2     2      4
3     3      1
4     4      2
5     5    3,4
6     6    1,3
like image 44
tmfmnk Avatar answered Nov 14 '22 23:11

tmfmnk


foo = function(..., sep = ","){
    paste(..., sep = sep)
}

gsub(",?NA|NA,?", "", do.call(foo, before))
#[1] "1,3,4" "4"     "1"     "2"     "3,4"   "1,3" 
like image 43
d.b Avatar answered Nov 15 '22 00:11

d.b