Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Omit NA values while pasting two column values together in R

Tags:

r

I have a dataframe called dd2. I need to paste the values in Left.Gene.Symbols and Right.Gene.Symbols which I can do by simply using code below, but I would not want NAs pasted along if there is missing values. I want it to look like in the combination column as shown in result.

mycode

#to remove NAs
dd2[dd2 == 'NA'] <- NA
#pasting values together
result <- cbind(dd2,combination = paste(dd2[,"Left.Gene.Symbols"],dd2[,"Right.Gene.Symbols"],sep="*"))

data

dd2<- structure(c("AMLM12001KP", "AMLM12001KP", "AMLM12001KP", "AMLM12001KP", 
"AMLM12001KP", "AK2", "HFM1", "HFM1", "HFM1", "HFM1", NA, "PPT", 
NA, "GGT", NA), .Dim = c(5L, 3L), .Dimnames = list(NULL, c("customer_sample_id", 
"Left.Gene.Symbols", "Right.Gene.Symbols")))

result

   customer_sample_id Left.Gene.Symbols Right.Gene.Symbols  combination
[1,] "AMLM12001KP"      "AK2"             NA                    AK2*
[2,] "AMLM12001KP"      "HFM1"           "PPT"                  HFM1*PPT
[3,] "AMLM12001KP"      "HFM1"            NA                    HFM1*
[4,] "AMLM12001KP"      "HFM1"           "GGT"                  HFM1*GGT
[5,] "AMLM12001KP"      "HFM1"            NA                    HFM1* 
like image 798
MAPK Avatar asked Dec 15 '15 04:12

MAPK


People also ask

How do I omit NA cells in R?

The na. omit() function returns a list without any rows that contain na values. It will drop rows with na value / nan values. This is the fastest way to remove na rows in the R programming language.

How do I paste columns together in R?

How do I concatenate two columns in R? To concatenate two columns you can use the <code>paste()</code> function. For example, if you want to combine the two columns A and B in the dataframe df you can use the following code: <code>df['AB'] <- paste(df$A, df$B)</code>.

How do I select a column without NA in R?

There are two easy methods to select columns of an R data frame without missing values, first one results in a vector and other returns a matrix. For example, if we have a data frame called df then the first method can be used as df[,colSums(is.na(df))==0] and the second method will be used as t(na.

What does Na exclude do in R?

action settings within R include: na. omit and na. exclude: returns the object with observations removed if they contain any missing values; differences between omitting and excluding NAs can be seen in some prediction and residual functions.


2 Answers

You could do something like this, temporarily replacing NA values with the empty character "".

cbind(
    dd2, 
    combination = paste(dd2[,2], replace(dd2[,3], is.na(dd2[,3]), ""), sep = "*")
)
#      customer_sample_id Left.Gene.Symbols Right.Gene.Symbols combinations
# [1,] "AMLM12001KP"      "AK2"             NA                 "AK2*"      
# [2,] "AMLM12001KP"      "HFM1"            "PPT"              "HFM1*PPT"  
# [3,] "AMLM12001KP"      "HFM1"            NA                 "HFM1*"     
# [4,] "AMLM12001KP"      "HFM1"            "GGT"              "HFM1*GGT"  
# [5,] "AMLM12001KP"      "HFM1"            NA                 "HFM1*"    

Of course substitute your column names for the column numbers above. I didn't write them because they are too long.

like image 67
Rich Scriven Avatar answered Sep 27 '22 21:09

Rich Scriven


We can use NAer from qdap with sprintf

library(qdap)
sprintf('%s*%s', dd2[,2],NAer(dd2[,3],''))
#[1] "AK2*"     "HFM1*PPT" "HFM1*"    "HFM1*GGT" "HFM1*"   
like image 39
akrun Avatar answered Sep 27 '22 22:09

akrun