Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Merge two data frame and replace the NA value in R

Tags:

merge

dataframe

r

I have a main table(a), containing column: id, age, and sex. eg.

a <- data.frame(id=letters[1:4], age=c(18,NA,9,NA), sex=c("M","F","F","M"))
  id age sex
1  a  18   M
2  b  NA   F
3  c   9   F
4  d  NA   M

And I have a supplement table(b), just containing all the missing data in table(a) or duplicated data in table(a). eg.

b <- data.frame(id=c("a","b","d"), age=c(18,32,20))
  id age
1  a  18
2  b  32
3  d  20

Now I want to merge the two table, like this:

  id age sex
1  a  18   M
2  b  32   F
3  c   9   F
4  d  20   M

However, I'd tried merge(a,b,by="id",all=T). The result is not what I want. Is there any way to solve this problem? Thank you!

like image 991
Eric Chang Avatar asked Nov 27 '15 09:11

Eric Chang


People also ask

How do I replace Na in a data frame in R?

You can replace NA values with blank space on columns of R dataframe (data. frame) by using is.na() , replace() methods. And use dplyr::mutate_if() to replace only on character columns when you have mixed numeric and character columns, use dplyr::mutate_at() to replace on multiple selected columns by index and name.

How do I replace the NA with values in R?

You can replace NA values with zero(0) on numeric columns of R data frame by using is.na() , replace() , imputeTS::replace() , dplyr::coalesce() , dplyr::mutate_at() , dplyr::mutate_if() , and tidyr::replace_na() functions.

Can you combine 2 Dataframes in R?

In R we use merge() function to merge two dataframes in R. This function is present inside join() function of dplyr package. The most important condition for joining two dataframes is that the column type should be the same on which the merging happens.


2 Answers

Here is a dplyr solution:

library(dplyr)

c <- left_join(a,b, by = "id") %>% # this will generate age.x and age.y
  mutate(age = ifelse(is.na(age.x), age.y, age.x)) %>% # we generate a joint 'age' variable
  select(-age.y, -age.x) # drop the superfluous columns

> c
  id sex age
1  a   M  18
2  b   F  32
3  c   F   9
4  d   M  20

Note that this will throw you a warning that you try to join on factor levels. This is because the example data in the reproducible example was generated with stringsAsFactors = T.

like image 31
Felix Avatar answered Sep 28 '22 05:09

Felix


We can use data.table

library(data.table)
setDT(a)[b, agei := i.age, on='id'][is.na(age), age := agei][,agei:= NULL][]
a
 #  id age sex
#1:  a  18   M
#2:  b  32   F
#3:  c   9   F
#4:  d  20   M
like image 71
akrun Avatar answered Sep 28 '22 05:09

akrun