Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

join matching columns in a data.frame or data.table

I have the following data.frames:

a <- data.frame(id = 1:3, v1 = c('a', NA, NA), v2 = c(NA, 'b', 'c'))
b <- data.frame(id = 1:3, v1 = c(NA, 'B', 'C'), v2 = c("A", NA, NA))
> a
  id   v1   v2
1  1    a <NA>
2  2 <NA>    b
3  3 <NA>    c
> b
  id   v1   v2
1  1 <NA>    A
2  2    B <NA>
3  3    C <NA>

note: There are no ids for which v1 or v2 are defined in both tables; there is only a single unique non-NA value in each column for each id value

I would like to merge these data frames on matching values of "id':

ab <- merge(a, b, by = "id")

but I would also like to combine the two columns v1 and v2, so that the data.frame ab will look like this:

ab <- data.frame(id = 1:3, v1 = c("a", "B", "C"), v2 = c("A", "b", "c"))

> ab
  id v1 v2
1  1  a  A
2  2  B  b
3  3  C  c

instead, I get this:

> merge(a, b, by = "id")
  id v1.x v2.x v1.y v2.y
1  1    a <NA> <NA>    A
2  2 <NA>    b    B <NA>
3  3 <NA>    c    C <NA>

it would be helpful to have examples using both data.frame and data.table, so here are the data.table versions of above:

A <- data.table(a, key = 'id')
B <- data.table(b, key = 'id')
A[B]
like image 383
David LeBauer Avatar asked Mar 29 '12 02:03

David LeBauer


People also ask

How do you join columns in a data frame?

Join DataFrames using their indexes. If we want to join using the key columns, we need to set key to be the index in both df and other . The joined DataFrame will have key as its index. Another option to join using the key columns is to use the on parameter.

How do you join two tables using data frames?

The concat() function can be used to concatenate two Dataframes by adding the rows of one to the other. The merge() function is equivalent to the SQL JOIN clause. 'left', 'right' and 'inner' joins are all possible.

How do you join two sets of data?

To join two data frames (datasets) vertically, use the rbind function. The two data frames must have the same variables, but they do not have to be in the same order. If data frameA has variables that data frameB does not, then either: Delete the extra variables in data frameA or.

How do I join a column into a DataFrame in R?

In R we use merge() function to merge two dataframes in R. This function is present inside join() function of dplyr package. The most important condition for joining two dataframes is that the column type should be the same on which the merging happens. merge() function works similarly like join in DBMS.


2 Answers

The type of merge you specify probably won't be possible using merge (with data frames), although saying that usually invites being proved wrong.

You also omit some details: will there always be a single unique non-NA value in each column for each id value? If so, this will work:

ab <- rbind(a,b)
> colFun <- function(x){x[which(!is.na(x))]}
> ddply(ab,.(id),function(x){colwise(colFun)(x)})
  id v1 v2
1  1  a  A
2  2  B  b
3  3  C  c

A similar strategy should work with data.tables as well:

abDT <- data.table(ab,key = "id")
> abDT[,list(colFun(v1),colFun(v2)),by = id]
     id V1 V2
[1,]  1  a  A
[2,]  2  B  b
[3,]  3  C  c
like image 197
joran Avatar answered Oct 04 '22 22:10

joran


If your data is as simple as it is above joran's answer is likely the simplest way. Here's may approach in base:

a <- data.frame(id = 1:3, v1 = c('a', NA, NA), v2 = c(NA, 'b', 'c'))
b <- data.frame(id = 1:3, v1 = c(NA, 'B', 'C'), v2 = c("A", NA, NA))

decider <- function(x, y) factor(ifelse(is.na(x), as.character(y), as.character(x)))
data.frame(mapply(a, b, FUN = decider))

If your data has different id's (some overlap and some do not, then here's a different approach:

a <- data.frame(id = c(1,2,4,5), v1 = c('a', NA, "q", NA), v2 = c(NA, 'b', 'c', "e"))
b <- data.frame(id = 1:4, v1 = c(NA, "A", "C", 'B'), v2 = c("A", NA, "D", NA))

decider <- function(x, y) factor(ifelse(is.na(x), as.character(y), as.character(x)))

DF <- data.frame(mapply(a, b, FUN = decider))
DF2 <- rbind(b[!b$id %in% DF$id , ], DF)
DF2 <- DF2[order(DF2$id), ]
rownames(DF2) <- 1:nrow(DF2)
like image 38
Tyler Rinker Avatar answered Oct 04 '22 23:10

Tyler Rinker