Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Combine column to remove NA's

Tags:

merge

r

na

I have some columns in R and for each row there will only ever be a value in one of them, the rest will be NA's. I want to combine these into one column with the non-NA value. Does anyone know of an easy way of doing this. For example I could have as follows:

data <- data.frame('a' = c('A','B','C','D','E'),                    'x' = c(1,2,NA,NA,NA),                    'y' = c(NA,NA,3,NA,NA),                    'z' = c(NA,NA,NA,4,5)) 

So I would have

'a' 'x' 'y' 'z'    A   1   NA  NA    B   2   NA  NA    C  NA   3   NA    D  NA   NA  4    E  NA   NA  5 

And I would to get

 'a' 'mycol'     A   1     B   2     C   3     D   4     E   5   

The names of the columns containing NA changes depending on code earlier in the query so I won't be able to call the column names explicitly, but I have the column names of the columns which contains NA's stored as a vector e.g. in this example cols <- c('x','y','z'), so could call the columns using data[, cols].

Any help would be appreciated.

Thanks

like image 924
user1165199 Avatar asked Jan 28 '13 13:01

user1165199


People also ask

How do I combine two columns in R studio?

How do I concatenate two columns in R? To concatenate two columns you can use the <code>paste()</code> function. For example, if you want to combine the two columns A and B in the dataframe df you can use the following code: <code>df['AB'] <- paste(df$A, df$B)</code>.

How do you omit Na in a data frame?

To remove all rows having NA, we can use na. omit function. For Example, if we have a data frame called df that contains some NA values then we can remove all rows that contains at least one NA by using the command na. omit(df).

How do I remove a column with all na?

If we need to drop such columns that contain NA, we can use the axis=column s parameter of DataFrame. dropna() to specify deleting the columns. By default, it removes the column where one or more values are missing.

How do I change NA values in a column?

The easiest way to replace NA's with the mean in multiple columns is by using the functions mutate_at() and vars(). These functions let you select the columns in which you want to replace the missing values. To actually replace the NA with the mean, you can use the replace_na() and mean() function.


2 Answers

A dplyr::coalesce based solution could be as:

data %>% mutate(mycol = coalesce(x,y,z)) %>%          select(a, mycol) #   a mycol # 1 A     1 # 2 B     2 # 3 C     3 # 4 D     4 # 5 E     5  

Data

data <- data.frame('a' = c('A','B','C','D','E'),                  'x' = c(1,2,NA,NA,NA),                  'y' = c(NA,NA,3,NA,NA),                  'z' = c(NA,NA,NA,4,5)) 
like image 187
MKR Avatar answered Sep 28 '22 00:09

MKR


You can use unlist to turn the columns into one vector. Afterwards, na.omit can be used to remove the NAs.

cbind(data[1], mycol = na.omit(unlist(data[-1])))     a mycol x1 A     1 x2 B     2 y3 C     3 z4 D     4 z5 E     5 
like image 45
Sven Hohenstein Avatar answered Sep 28 '22 00:09

Sven Hohenstein