Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Merge by column name R

Tags:

merge

r

matrix

I've got

a <- matrix(c(1,3,4,2,2,6,3,1,6), nrow = 3, ncol=3, byrow=TRUE, dimnames = list(NULL, c("Apples", "Pears", "Oranges")))

  Pears Apples Oranges
1     1      3       4
2     2      2       6
3     3      1       6

b <- matrix(c(1,3,2,6,3,6), nrow = 3, ncol=2, byrow=TRUE, dimnames = list(NULL, c( "Pears", "Oranges")))

  Pears Oranges
1     1       4
2     2       6
3     3       6

I want to merge them to get a result as such:

 Pears Apples Oranges
1     1      3       4
2     2      2       6
3     3      1       6
4     1     Na       4
5     2     Na       6
6     3     Na       6

Ie, combine them by column name an leave Na/s where the second frame lacks values, for a general case of bigger matrix a, smaller matrix b

rbind does not work, merge does something weird. What am I after? I could do with the most memory efficient thing available too, as this is eventually going to be done a large number of times, with a lot of column names.

Thanks,

-N

EDIT: I probably should have mentioned this when I asked originally, but I actually want to achieve the exact effect as above, but with some pretty major caveats:

I'm using matrices

The first matrix will always contain all and more colnames than the second

I possibly want to create a big.matrix from package bigmemory.

like image 918
N. McA. Avatar asked Jan 25 '26 22:01

N. McA.


2 Answers

here is a more generic approach, in case you have multiple column in a and b that need to be added:

b.toAdd <- setdiff (names(a), names(b))
if (length(b.toAdd))
  b[, b.toAdd] <- NA

a.toAdd <- setdiff (names(b), names(a))
if (length(a.toAdd))
  a[, a.toAdd] <- NA

rbind(a, b)

Update:

Just noticed your comment about needing memory efficiency. In that case, you probably want to use data.table since using <- will create unnecessary copies.
data.table isntead has a := operator which is significantly more efficient.

library(data.table)
a <- data.table(a)
b <- data.table(b)


if (length(b.toAdd <- setdiff (names(a), names(b))))
    b[, c(b.toAdd) := NA]

if (length(a.toAdd <- setdiff (names(b), names(a))))
    a[, c(a.toAdd) := NA]

rbind(a, b, use.names=TRUE)

#    Pears Apples Oranges
# 1:     1      3       4
# 2:     2      2       6
# 3:     3      1       6
# 4:     1     NA       4
# 5:     2     NA       6
# 6:     3     NA       6

search SO for [r] data.table benchmarks to get an idea of the improvements

like image 113
Ricardo Saporta Avatar answered Jan 27 '26 10:01

Ricardo Saporta


You could use:

rbind(a, cbind(b, Apples=NA))
  Pears Apples Oranges
1     1      3       4
2     2      2       6
3     3      1       6
4     1     NA       4
5     2     NA       6
6     3     NA       6
like image 44
johannes Avatar answered Jan 27 '26 10:01

johannes



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!