I'm seeing some unexpected behaviour with merge (or at least not entirely intuitive). But perhaps I'm just not understanding how it's supposed to work:
Let's create some dummy data to play with first:
x <- structure(list(A = c(2L, 2L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L), B = c(2L, 2L, 1L, 2L,
1L, 1L, 1L, 1L, 2L, 1L, 1L, 2L, 1L, 2L, 1L, 1L, 2L, 2L, 1L, 1L
), C = c(2L, 1L, 2L, 1L, 2L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 2L,
2L, 1L, 1L, 1L, 1L, 2L, 2L), D = c(2L, 1L, 2L, 2L, 2L, 1L, 1L,
2L, 1L, 2L, 1L, 2L, 1L, 1L, 2L, 1L, 2L, 1L, 1L, 1L), E = c(2L,
1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L,
1L, 1L, 1L), F = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 1L,
2L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L), G = c(2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L),
H = c(1L, 2L, 1L, 2L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L,
1L, 2L, 1L, 2L, 1L, 1L, 1L), I = c(1L, 1L, 2L, 2L, 2L, 1L,
1L, 1L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 1L),
J = c(2L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 1L, 2L, 1L, 2L,
2L, 2L, 2L, 2L, 1L, 2L, 1L), K = c(3, 3, 1, 3, 1, 3, 1, 2,
2, 2, 1, 3, 2, 2, 2, 1, NA, 1, 2, 1)), .Names = c("A", "B",
"C", "D", "E", "F", "G", "H", "I", "J", "K"), row.names = c(NA,
20L), class = "data.frame")
# Generate Listing of All Possible Combinations
y <- list(1:2); y = expand.grid(rep(y,10));
colnames(y) <- LETTERS[1:10]
y <- rbind(y,y,y)
y$K <- rep(1:3,each=1024)
y$mergekey <- sample(1:6,3072,replace=TRUE)
My expectation is that when I merge these two data sets that setting sort=FALSE
and all.x=TRUE
would provide me with a list of all x in place with mergekey
.
Let's try that:
merge(x,y,all.x=TRUE,sort=FALSE)
A B C D E F G H I J K mergekey
1 2 2 2 2 2 1 2 1 1 2 3 5
2 2 2 1 1 1 1 2 2 1 1 3 3
3 2 1 2 2 1 1 2 1 2 2 1 3
4 2 2 1 2 2 1 2 2 2 2 3 2
5 1 1 2 2 2 2 2 1 2 2 1 4
6 2 1 1 1 2 2 2 2 1 2 3 6
7 1 1 1 1 2 2 2 2 1 2 1 5
8 2 1 2 2 1 1 2 2 1 1 2 4
9 2 2 2 1 1 1 2 1 2 2 2 4
10 2 1 2 2 1 1 2 1 1 1 2 2
11 2 1 2 1 1 1 2 1 2 2 1 4
12 2 2 1 2 1 2 2 1 2 1 3 5
13 2 1 2 1 1 1 2 1 2 2 2 3
14 2 1 2 1 1 1 2 1 2 2 2 3
15 2 2 2 1 2 1 2 1 2 2 2 1
16 2 1 1 2 1 1 2 2 2 2 2 1
17 2 1 1 1 1 1 2 1 1 2 1 2
18 1 2 1 1 1 2 2 1 1 1 1 5
19 2 1 2 1 1 1 2 1 1 1 1 4
20 2 2 1 2 1 1 1 2 1 2 NA NA
Now it seems that "most of x is unsorted" but incomparables are pushed to the end, rather than maintaining their order.
So, my question is: How do I get the incomparables to stay in place?
PS: Does it not seem a little unintuitive to push incomparables to the end if the merge has been told not to sort? I don't find this congruent with this behaviour either
The join
function in the plyr
package solves this problem intuitively without additional arguements.
library(plyr)
join(x,y)
Joining by: A, B, C, D, E, F, G, H, I, J, K
A B C D E F G H I J K mergekey
1 2 2 2 2 2 1 2 1 1 2 3 4
2 2 2 1 1 1 1 2 2 1 1 3 3
3 2 1 2 2 1 1 2 1 2 2 1 5
4 2 2 1 2 2 1 2 2 2 2 3 3
5 1 1 2 2 2 2 2 1 2 2 1 6
6 2 1 1 1 2 2 2 2 1 2 3 6
7 1 1 1 1 2 2 2 2 1 2 1 4
8 2 1 2 2 1 1 2 2 1 1 2 2
9 2 2 2 1 1 1 2 1 2 2 2 4
10 2 1 2 2 1 1 2 1 1 1 2 6
11 2 1 2 1 1 1 2 1 2 2 1 1
12 2 2 1 2 1 2 2 1 2 1 3 3
13 2 1 2 1 1 1 2 1 2 2 2 2
14 2 2 2 1 2 1 2 1 2 2 2 6
15 2 1 1 2 1 1 2 2 2 2 2 2
16 2 1 1 1 1 1 2 1 1 2 1 3
17 2 2 1 2 1 1 1 2 1 2 NA NA
18 1 2 1 1 1 2 2 1 1 1 1 1
19 2 1 2 1 1 1 2 1 2 2 2 2
20 2 1 2 1 1 1 2 1 1 1 1 1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With