Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Replace values from another dataframe by IDs

Tags:

dataframe

r

I have two data frames::

as1 <- data.frame(ID = c(1,2,3,4,5,6),
                  pID = c(21,22,23,24,25,26),
                  Values = c(435,33,45,NA, NA,12))
as2 <- data.frame(ID = c(4,5),
                  pid = c(24,25),
                  Values = c(544, 676))

I need to replace the NA values in as1 with those in as2 by matching ID and pID

I need to get the result data frame as:

  resultdf
    ID pID Values
    1  1  21    435
    2  2  22     33
    3  3  23     45
    4  4  24    544
    5  5  25    676
    6  6  26     12

I tried doing subset and then na.omit() and then rbinding... but I'm losing the index.

like image 428
Tarak Avatar asked Oct 21 '16 12:10

Tarak


4 Answers

Here are two base R solutions.

First, using match in "ID" to select the elements of "Value" in as1 to fill in:

as1$Values[match(as2$ID, as1$ID)] <- as2$Values

as1
  ID pID Values
1  1  21    435
2  2  22     33
3  3  23     45
4  4  24    544
5  5  25    676
6  6  26     12

This only works if ID is the true ID for both data sets (that is, pid is "irrelevant"). Second, in the case that pid is also needed, you could use merge and then "collapse" the two values columns as follows:

df <- merge(as1, as2, by.x=c("ID", "pID"), by.y=c("ID", "pid"), all=TRUE)

This produces a four column data frame with two values columns. Collapse these into a single column with ifelse:

df <- cbind(df[c(1,2)], "Values"=with(df, ifelse(is.na(Values.y), Values.x, Values.y)))

df
  ID pID Values
1  1  21    435
2  2  22     33
3  3  23     45
4  4  24    544
5  5  25    676
6  6  26     12
like image 173
lmo Avatar answered Oct 11 '22 13:10

lmo


Or another option is data.table

library(data.table)
setDT(as1)[as2, Values := i.Values , on =.(ID, pid)]
as1
#    ID pid Values
#1:  1  21    435
#2:  2  22     33
#3:  3  23     45
#4:  4  24    544
#5:  5  25    676
#6:  6  26     12
like image 42
akrun Avatar answered Oct 11 '22 13:10

akrun


Try with dplyr library. Firstly join both tables and then create new column with values instead NA's:

library("dplyr")

as1 <- data.frame(ID = c(1,2,3,4,5,6), pid = c(21,22,23,24,25,26),Values = c(435,33,45,NA, NA,12))
as2 <- data.frame(ID = c(4,5),pid = c(24,25), Values = c(544, 676))

left_join(as1, as2, by = c("ID", "pid")) %>% 
    mutate(Values = ifelse(is.na(Values.x), Values.y, Values.x)) %>% 
    select(ID, pid, Values)

# ID pid Values
# 1  1  21    435
# 2  2  22     33
# 3  3  23     45
# 4  4  24    544
# 5  5  25    676
# 6  6  26     12
like image 36
Marta Avatar answered Oct 11 '22 14:10

Marta


Following Marta's dplyr solution, we can use coalesce instead to combine the merged Value.x and Value.y:

library(dplyr)
res <- as1 %>% left_join(as2, by = c("ID", "pID"="pid")) %>%
               mutate(Values=coalesce(Values.x,Values.y)) %>%
               select(-Values.x,-Values.y)
##  ID pID Values
##1  1  21    435
##2  2  22     33
##3  3  23     45
##4  4  24    544
##5  5  25    676
##6  6  26     12

Note also the syntax used for the by parameter in left_join. Here, we join by ID and by pID for as1 and pid for as2.

like image 36
aichao Avatar answered Oct 11 '22 14:10

aichao