I'm getting some really bizarre stuff while trying to merge multiple data frames. Help!
I need to merge a bunch of data frames by the columns 'RID' and 'VISCODE'. Here is an example of what it looks like:
d1 = data.frame(ID = sample(9, 1:100), RID = c(2, 5, 7, 9, 12),
VISCODE = rep('bl', 5),
value1 = rep(16, 5))
d2 = data.frame(ID = sample(9, 1:100), RID = c(2, 2, 2, 5, 5, 5, 7, 7, 7),
VISCODE = rep(c('bl', 'm06', 'm12'), 3),
value2 = rep(100, 9))
d3 = data.frame(ID = sample(9, 1:100), RID = c(2, 2, 2, 5, 5, 5, 9,9,9),
VISCODE = rep(c('bl', 'm06', 'm12'), 3),
value3 = rep("a", 9),
values3.5 = rep("c", 9))
d4 = data.frame(ID =sample(8, 1:100), RID = c(2, 2, 5, 5, 5, 7, 7, 7, 9),
VISCODE = c(c('bl', 'm12'), rep(c('bl', 'm06', 'm12'), 2), 'bl'),
value4 = rep("b", 9))
dataList = list(d1, d2, d3, d4)
I looked at the answers to the question titled "Merge several data.frames into one data.frame with a loop." I used the reduce method suggested there as well as a loop I wrote:
try1 = mymerge(dataList)
try2 <- Reduce(function(x, y) merge(x, y, all= TRUE,
by=c("RID", "VISCODE")), dataList, accumulate=F)
where dataList is a list of data frames and mymerge is:
mymerge = function(dataList){
L = length(dataList)
mdat = dataList[[1]]
for(i in 2:L){
mdat = merge(mdat, dataList[[i]], by.x = c("RID", "VISCODE"),
by.y = c("RID", "VISCODE"), all = TRUE)
}
mdat
}
For my test data and subsets of my real data, both of these work fine and produce exactly the same results. However, when I use larger subsets of my data, they both break down and give me the following error: Error in match.names(clabs, names(xi)) : names do not match previous names.
The really weird thing is that using this works:
dataList = list(demog[1:50,],
neurobat[1:50,],
apoe[1:50,],
mmse[1:50,],
faq[1:47, ])
And using this fails:
dataList = list(demog[1:50,],
neurobat[1:50,],
apoe[1:50,],
mmse[1:50,],
faq[1:48, ])
As far as I can tell, there is nothing special about row 48 of faq. Likewise, using this works:
dataList = list(demog[1:50,],
neurobat[1:50,],
apoe[1:50,],
mmse[1:50,],
pdx[1:47, ])
And using this fails:
dataList = list(demog[1:50,],
neurobat[1:50,],
apoe[1:50,],
mmse[1:50,],
pdx[1:48, ])
Row 48 in faq and row 48 in pdx have the same values for RID and VISCODE, the same value for EXAMDATE (something I'm not matching on) and different values for ID (another thing I'm not matching on). Besides the matching RID and VISCODE, I see anything special about them. They don't share any other variable names. This same scenario occurs elsewhere in the data without problems.
To add icing on the complication cake, this doesn't even work:
dataList = list(demog[1:50,],
neurobat[1:50,],
apoe[1:50,],
mmse[1:50,],
faq[1:48, 2:3])
where columns 2 and 3 are "RID" and "VISCODE".
48 isn't even the magic number because this works:
dataList = list(demog[1:500,],
neurobat[1:500,],
apoe[1:500,],
mmse[1:457,])
while using mmse[1:458, ] fails.
I can't seem to come up with test data that causes the problem. Has anyone had this problem before? Any better ideas on how to merge?
The RStudio console returns the error message “Error in match. names(clabs, names(xi)) : names do not match previous names”. The reason for this is that the column names of the first and the second data frame are not the same.
The R Error: names do not match previous names occurs when you try to join one or more data frames using rbind() where one or more of the column names mismatch. You can either change the column names so that they are identical using names() or fill the rows of the missing columns with NA using bind_rows() or rbind.
Method 1: using colnames() method colnames() method in R is used to rename and replace the column names of the data frame in R. The columns of the data frame can be renamed by specifying the new column names as a vector. The new name replaces the corresponding old name of the column in the data frame.
In R, the easiest way to remove columns from a data frame based on their name is by using the %in% operator. This operator lets you specify the redundant column names and, in combination with the names() function, removes them from the data frame. Alternatively, you can use the subset() function or the dplyr package.
Not sure I can help unfortunately but thought I would post as I found this searching for help on this error. What I effectively had was:
a <- cbind(b,c)
d <- merge(a,e)
And I got that same error. Using a <- data.frame(b,c)
fixed the problem, but I can't work out why.
object.size(a);1248124200 bytes
object.size(c);1248124032 bytes
So something is different. All classes are the same, str()
reveals nothing. I'm stumped.
Hopefully that aids someone else in the know.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With