a <- data.frame(id = 1:3, v1 = c('a', NA, NA), v2 = c(NA, 'b', 'c'))
b <- data.frame(id = 1:3, v1 = c(NA, 'B', 'C'), v2 = c("A", NA, NA))
> a
id v1 v2
1 1 a <NA>
2 2 <NA> b
3 3 <NA> c
> b
id v1 v2
1 1 <NA> A
2 2 B <NA>
3 3 C <NA>
note: There are no ids for which v1 or v2 are defined in both tables; there is only a single unique non-NA value in each column for each id value
ab <- merge(a, b, by = "id")
v1
and v2
, so that the data.frame ab
will look like this:ab <- data.frame(id = 1:3, v1 = c("a", "B", "C"), v2 = c("A", "b", "c"))
> ab
id v1 v2
1 1 a A
2 2 B b
3 3 C c
> merge(a, b, by = "id")
id v1.x v2.x v1.y v2.y
1 1 a <NA> <NA> A
2 2 <NA> b B <NA>
3 3 <NA> c C <NA>
data.frame
and data.table
, so here are the data.table versions of above:A <- data.table(a, key = 'id')
B <- data.table(b, key = 'id')
A[B]
Join DataFrames using their indexes. If we want to join using the key columns, we need to set key to be the index in both df and other . The joined DataFrame will have key as its index. Another option to join using the key columns is to use the on parameter.
The concat() function can be used to concatenate two Dataframes by adding the rows of one to the other. The merge() function is equivalent to the SQL JOIN clause. 'left', 'right' and 'inner' joins are all possible.
To join two data frames (datasets) vertically, use the rbind function. The two data frames must have the same variables, but they do not have to be in the same order. If data frameA has variables that data frameB does not, then either: Delete the extra variables in data frameA or.
In R we use merge() function to merge two dataframes in R. This function is present inside join() function of dplyr package. The most important condition for joining two dataframes is that the column type should be the same on which the merging happens. merge() function works similarly like join in DBMS.
The type of merge you specify probably won't be possible using merge
(with data frames), although saying that usually invites being proved wrong.
You also omit some details: will there always be a single unique non-NA
value in each column for each id
value? If so, this will work:
ab <- rbind(a,b)
> colFun <- function(x){x[which(!is.na(x))]}
> ddply(ab,.(id),function(x){colwise(colFun)(x)})
id v1 v2
1 1 a A
2 2 B b
3 3 C c
A similar strategy should work with data.table
s as well:
abDT <- data.table(ab,key = "id")
> abDT[,list(colFun(v1),colFun(v2)),by = id]
id V1 V2
[1,] 1 a A
[2,] 2 B b
[3,] 3 C c
If your data is as simple as it is above joran's answer is likely the simplest way. Here's may approach in base:
a <- data.frame(id = 1:3, v1 = c('a', NA, NA), v2 = c(NA, 'b', 'c'))
b <- data.frame(id = 1:3, v1 = c(NA, 'B', 'C'), v2 = c("A", NA, NA))
decider <- function(x, y) factor(ifelse(is.na(x), as.character(y), as.character(x)))
data.frame(mapply(a, b, FUN = decider))
If your data has different id's (some overlap and some do not, then here's a different approach:
a <- data.frame(id = c(1,2,4,5), v1 = c('a', NA, "q", NA), v2 = c(NA, 'b', 'c', "e"))
b <- data.frame(id = 1:4, v1 = c(NA, "A", "C", 'B'), v2 = c("A", NA, "D", NA))
decider <- function(x, y) factor(ifelse(is.na(x), as.character(y), as.character(x)))
DF <- data.frame(mapply(a, b, FUN = decider))
DF2 <- rbind(b[!b$id %in% DF$id , ], DF)
DF2 <- DF2[order(DF2$id), ]
rownames(DF2) <- 1:nrow(DF2)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With