Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Setting incomparables in place with merge

Tags:

r

I'm seeing some unexpected behaviour with merge (or at least not entirely intuitive). But perhaps I'm just not understanding how it's supposed to work:

Let's create some dummy data to play with first:

x <- structure(list(A = c(2L, 2L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L), B = c(2L, 2L, 1L, 2L, 
1L, 1L, 1L, 1L, 2L, 1L, 1L, 2L, 1L, 2L, 1L, 1L, 2L, 2L, 1L, 1L
), C = c(2L, 1L, 2L, 1L, 2L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 
2L, 1L, 1L, 1L, 1L, 2L, 2L), D = c(2L, 1L, 2L, 2L, 2L, 1L, 1L, 
2L, 1L, 2L, 1L, 2L, 1L, 1L, 2L, 1L, 2L, 1L, 1L, 1L), E = c(2L, 
1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 
1L, 1L, 1L), F = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 
2L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L), G = c(2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L), 
    H = c(1L, 2L, 1L, 2L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 
    1L, 2L, 1L, 2L, 1L, 1L, 1L), I = c(1L, 1L, 2L, 2L, 2L, 1L, 
    1L, 1L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 1L), 
    J = c(2L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 
    2L, 2L, 2L, 2L, 1L, 2L, 1L), K = c(3, 3, 1, 3, 1, 3, 1, 2, 
    2, 2, 1, 3, 2, 2, 2, 1, NA, 1, 2, 1)), .Names = c("A", "B", 
"C", "D", "E", "F", "G", "H", "I", "J", "K"), row.names = c(NA, 
20L), class = "data.frame")

# Generate Listing of All Possible Combinations 
y <- list(1:2); y = expand.grid(rep(y,10)); 
colnames(y) <- LETTERS[1:10]
y <- rbind(y,y,y)
y$K <- rep(1:3,each=1024)
y$mergekey <- sample(1:6,3072,replace=TRUE) 

My expectation is that when I merge these two data sets that setting sort=FALSE and all.x=TRUE would provide me with a list of all x in place with mergekey.

Let's try that:

merge(x,y,all.x=TRUE,sort=FALSE)
   A B C D E F G H I J  K mergekey
1  2 2 2 2 2 1 2 1 1 2  3        5
2  2 2 1 1 1 1 2 2 1 1  3        3
3  2 1 2 2 1 1 2 1 2 2  1        3
4  2 2 1 2 2 1 2 2 2 2  3        2
5  1 1 2 2 2 2 2 1 2 2  1        4
6  2 1 1 1 2 2 2 2 1 2  3        6
7  1 1 1 1 2 2 2 2 1 2  1        5
8  2 1 2 2 1 1 2 2 1 1  2        4
9  2 2 2 1 1 1 2 1 2 2  2        4
10 2 1 2 2 1 1 2 1 1 1  2        2
11 2 1 2 1 1 1 2 1 2 2  1        4
12 2 2 1 2 1 2 2 1 2 1  3        5
13 2 1 2 1 1 1 2 1 2 2  2        3
14 2 1 2 1 1 1 2 1 2 2  2        3
15 2 2 2 1 2 1 2 1 2 2  2        1
16 2 1 1 2 1 1 2 2 2 2  2        1
17 2 1 1 1 1 1 2 1 1 2  1        2
18 1 2 1 1 1 2 2 1 1 1  1        5
19 2 1 2 1 1 1 2 1 1 1  1        4
20 2 2 1 2 1 1 1 2 1 2 NA       NA

Now it seems that "most of x is unsorted" but incomparables are pushed to the end, rather than maintaining their order.

So, my question is: How do I get the incomparables to stay in place?

PS: Does it not seem a little unintuitive to push incomparables to the end if the merge has been told not to sort? I don't find this congruent with this behaviour either

like image 568
Brandon Bertelsen Avatar asked Aug 24 '12 17:08

Brandon Bertelsen


1 Answers

The join function in the plyr package solves this problem intuitively without additional arguements.

library(plyr)
join(x,y)

Joining by: A, B, C, D, E, F, G, H, I, J, K
   A B C D E F G H I J  K mergekey
1  2 2 2 2 2 1 2 1 1 2  3        4
2  2 2 1 1 1 1 2 2 1 1  3        3
3  2 1 2 2 1 1 2 1 2 2  1        5
4  2 2 1 2 2 1 2 2 2 2  3        3
5  1 1 2 2 2 2 2 1 2 2  1        6
6  2 1 1 1 2 2 2 2 1 2  3        6
7  1 1 1 1 2 2 2 2 1 2  1        4
8  2 1 2 2 1 1 2 2 1 1  2        2
9  2 2 2 1 1 1 2 1 2 2  2        4
10 2 1 2 2 1 1 2 1 1 1  2        6
11 2 1 2 1 1 1 2 1 2 2  1        1
12 2 2 1 2 1 2 2 1 2 1  3        3
13 2 1 2 1 1 1 2 1 2 2  2        2
14 2 2 2 1 2 1 2 1 2 2  2        6
15 2 1 1 2 1 1 2 2 2 2  2        2
16 2 1 1 1 1 1 2 1 1 2  1        3
17 2 2 1 2 1 1 1 2 1 2 NA       NA
18 1 2 1 1 1 2 2 1 1 1  1        1
19 2 1 2 1 1 1 2 1 2 2  2        2
20 2 1 2 1 1 1 2 1 1 1  1        1
like image 126
Brandon Bertelsen Avatar answered Oct 04 '22 01:10

Brandon Bertelsen