I have a data frame dd2
with hundreds of columns and what I need to do is paste all these column values together omitting any NA
values. If I do something like this
apply(dd2, 1, paste, collapse=",")
it actually includes NA
s as "NA"
string. I want to avoid that. I could also do as shown below, but this would expect me to work for each individual column at a time to get the result.
result <- cbind(
dd2,
combination = paste(dd2[,2], replace(dd2[,3], is.na(dd2[,3]), ""), sep = ",")
)
Is there any efficient way to do it? Here is the sample data:
dd2 <- structure(c("A", "B", "C", "D", "E", "AK2", "HFM1", NA, "TRR",
"RTT", NA, "PPT", "TRR", "RTT", NA, "PPT", NA, NA, "GGT", NA), .Dim = c(5L,
4L), .Dimnames = list(NULL, c("sample_id", "plant", "animal",
"more")))
You could try na.omit()
to omit the values, then paste. Also, you could use toString()
, as it is the equivalent of paste(..., collapse = ", ")
.
apply(dd2, 1, function(x) toString(na.omit(x)))
# [1] "A, AK2, PPT" "B, HFM1, PPT" "C, TRR"
# [4] "D, TRR, RTT, GGT" "E, RTT"
If you have specific columns you are using then
apply(dd2[, cols], 1, function(x) toString(na.omit(x)))
dd2
is a matrix in which case using apply
as suggested by @Rich Scriven is more appropriate. If it is a dataframe you can use tidyr::unite
dd2 <- data.frame(dd2)
tidyr::unite(dd2, result, plant, animal, more, na.rm = TRUE, sep = ',')
# sample_id result
#1 A AK2,PPT
#2 B HFM1,PPT
#3 C TRR
#4 D TRR,RTT,GGT
#5 E RTT
To combine all the columns you can use everything()
.
tidyr::unite(dd2, result, dplyr::everything(), na.rm = TRUE, sep = ',')
# result
#1 A,AK2,PPT
#2 B,HFM1,PPT
#3 C,TRR
#4 D,TRR,RTT,GGT
#5 E,RTT
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With