I have a dataframe called dd2
. I need to paste the values in Left.Gene.Symbols
and Right.Gene.Symbols
which I can do by simply using code below, but I would not want NAs pasted along if there is missing values. I want it to look like in the combination
column as shown in result
.
mycode
#to remove NAs
dd2[dd2 == 'NA'] <- NA
#pasting values together
result <- cbind(dd2,combination = paste(dd2[,"Left.Gene.Symbols"],dd2[,"Right.Gene.Symbols"],sep="*"))
data
dd2<- structure(c("AMLM12001KP", "AMLM12001KP", "AMLM12001KP", "AMLM12001KP",
"AMLM12001KP", "AK2", "HFM1", "HFM1", "HFM1", "HFM1", NA, "PPT",
NA, "GGT", NA), .Dim = c(5L, 3L), .Dimnames = list(NULL, c("customer_sample_id",
"Left.Gene.Symbols", "Right.Gene.Symbols")))
result
customer_sample_id Left.Gene.Symbols Right.Gene.Symbols combination
[1,] "AMLM12001KP" "AK2" NA AK2*
[2,] "AMLM12001KP" "HFM1" "PPT" HFM1*PPT
[3,] "AMLM12001KP" "HFM1" NA HFM1*
[4,] "AMLM12001KP" "HFM1" "GGT" HFM1*GGT
[5,] "AMLM12001KP" "HFM1" NA HFM1*
The na. omit() function returns a list without any rows that contain na values. It will drop rows with na value / nan values. This is the fastest way to remove na rows in the R programming language.
How do I concatenate two columns in R? To concatenate two columns you can use the <code>paste()</code> function. For example, if you want to combine the two columns A and B in the dataframe df you can use the following code: <code>df['AB'] <- paste(df$A, df$B)</code>.
There are two easy methods to select columns of an R data frame without missing values, first one results in a vector and other returns a matrix. For example, if we have a data frame called df then the first method can be used as df[,colSums(is.na(df))==0] and the second method will be used as t(na.
action settings within R include: na. omit and na. exclude: returns the object with observations removed if they contain any missing values; differences between omitting and excluding NAs can be seen in some prediction and residual functions.
You could do something like this, temporarily replacing NA
values with the empty character ""
.
cbind(
dd2,
combination = paste(dd2[,2], replace(dd2[,3], is.na(dd2[,3]), ""), sep = "*")
)
# customer_sample_id Left.Gene.Symbols Right.Gene.Symbols combinations
# [1,] "AMLM12001KP" "AK2" NA "AK2*"
# [2,] "AMLM12001KP" "HFM1" "PPT" "HFM1*PPT"
# [3,] "AMLM12001KP" "HFM1" NA "HFM1*"
# [4,] "AMLM12001KP" "HFM1" "GGT" "HFM1*GGT"
# [5,] "AMLM12001KP" "HFM1" NA "HFM1*"
Of course substitute your column names for the column numbers above. I didn't write them because they are too long.
We can use NAer
from qdap
with sprintf
library(qdap)
sprintf('%s*%s', dd2[,2],NAer(dd2[,3],''))
#[1] "AK2*" "HFM1*PPT" "HFM1*" "HFM1*GGT" "HFM1*"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With