How can I use merge to cbind two dataframes

Tags:

Suppose I have two dataframes:

df1 <- data.frame(matrix(rnorm(10*10),ncol=10))
df2 <- data.frame(matrix(rnorm(10*10),ncol=10))
colnames(df1) <- 1:10
colnames(df2) <- 11:20

How do I use merge to cbind these (I already know about cbind but I am interested in the application of merge here).

764

asked Dec 16 '13 03:12

user2763361

1 Answers

I have made the matrices smaller for display purposes.

> df1 <- data.frame(matrix(rnorm(15),ncol=3))
> df2 <- data.frame(matrix(rnorm(15),ncol=3))
> colnames(df1) <- paste0("A", 1:3)
> colnames(df2) <- paste0("B", 4:6)

We have data frames df1 and df2, with columns names A1, A2 & A3 and B1, B2 & B3 respectively.

So,as you know, cbind() just sticks the data frames together, side by side.

> cbind(df1, df2)
         A1        A2       A3        B4       B5        B6
1  2.055780  0.362796  1.25536 -1.748416  0.41855 -0.516635
2  0.010779  0.086778 -0.68413  1.183762 -1.20362  0.041147
3 -0.732393  0.235125 -0.89306  1.435362 -0.26066 -0.025933
4 -2.493843 -2.654263  0.36107  0.083018 -0.82251 -0.991135
5  0.935540  0.398196 -0.43043  0.470559 -0.54146  1.955555

merge() looks for common columns. In this case there are none, so it produces essentially an outer product in which each row of df1 is matched against each row of df2.

> merge(df1, df2)
          A1        A2       A3        B4       B5        B6
1   2.055780  0.362796  1.25536 -1.748416  0.41855 -0.516635
2   0.010779  0.086778 -0.68413 -1.748416  0.41855 -0.516635
3  -0.732393  0.235125 -0.89306 -1.748416  0.41855 -0.516635
4  -2.493843 -2.654263  0.36107 -1.748416  0.41855 -0.516635
5   0.935540  0.398196 -0.43043 -1.748416  0.41855 -0.516635
6   2.055780  0.362796  1.25536  1.183762 -1.20362  0.041147
7   0.010779  0.086778 -0.68413  1.183762 -1.20362  0.041147
8  -0.732393  0.235125 -0.89306  1.183762 -1.20362  0.041147
9  -2.493843 -2.654263  0.36107  1.183762 -1.20362  0.041147
10  0.935540  0.398196 -0.43043  1.183762 -1.20362  0.041147
11  2.055780  0.362796  1.25536  1.435362 -0.26066 -0.025933
12  0.010779  0.086778 -0.68413  1.435362 -0.26066 -0.025933
13 -0.732393  0.235125 -0.89306  1.435362 -0.26066 -0.025933
14 -2.493843 -2.654263  0.36107  1.435362 -0.26066 -0.025933
15  0.935540  0.398196 -0.43043  1.435362 -0.26066 -0.025933
16  2.055780  0.362796  1.25536  0.083018 -0.82251 -0.991135
17  0.010779  0.086778 -0.68413  0.083018 -0.82251 -0.991135
18 -0.732393  0.235125 -0.89306  0.083018 -0.82251 -0.991135
19 -2.493843 -2.654263  0.36107  0.083018 -0.82251 -0.991135
20  0.935540  0.398196 -0.43043  0.083018 -0.82251 -0.991135
21  2.055780  0.362796  1.25536  0.470559 -0.54146  1.955555
22  0.010779  0.086778 -0.68413  0.470559 -0.54146  1.955555
23 -0.732393  0.235125 -0.89306  0.470559 -0.54146  1.955555
24 -2.493843 -2.654263  0.36107  0.470559 -0.54146  1.955555
25  0.935540  0.398196 -0.43043  0.470559 -0.54146  1.955555

If we rename the first column in df1 so that it matches the name of the first column in df2 then merge() looks for common values in those two columns. Since there are no common values, the output is empty.

> colnames(df1)[1] = "B4"
> merge(df1, df2)
[1] B4 A2 A3 B5 B6
<0 rows> (or 0-length row.names)

But now if we copy (and reverse, just to make things interesting!) the first column of df2 into the first column of df1...

> df1$B4 = rev(df2$B4)
> df1
        B4       A2        A3
1 -0.50647 -0.48128  0.540799
2 -0.70684 -0.35401  0.872514
3  0.14341  1.12184 -0.079913
4 -0.59989  0.81912  1.726494
5  0.33864  0.85277  0.386702
> df2
        B4       B5        B6
1  0.33864  1.83677  0.406717
2 -0.59989 -0.43630  0.075029
3  0.14341  1.01496  0.095534
4 -0.70684  1.32414 -0.122613
5 -0.50647  0.70709 -0.700225

... and try to merge again...

> merge(df1, df2)
        B4       A2        A3       B5        B6
1 -0.70684 -0.35401  0.872514  1.32414 -0.122613
2 -0.59989  0.81912  1.726494 -0.43630  0.075029
3 -0.50647 -0.48128  0.540799  0.70709 -0.700225
4  0.14341  1.12184 -0.079913  1.01496  0.095534
5  0.33864  0.85277  0.386702  1.83677  0.406717

... we finally get something meaningful: the rows of df1 and df2 are stuck together according to values in df1$B4 matching values in df2$B4.

I hope that this helps.

answered Sep 27 '22 22:09

datawookie

Related questions
                            
                                reshape dataframe based on a string split in one column in R
                            
                                Selectively Modify Indices
                            
                                Removing NA columns in xts
                            
                                How to get something like Matplotlib's symlog scale in ggplot or lattice?
                            
                                Simple way to delete dataframe rows robust to instances where no rows match deletion criteria
                            
                                Moving average with varying time window in R
                            
                                Reorganizing data from 3 rows to 1
                            
                                Transfer values from one dataframe to another
                            
                                How can I evaluate (or create) an on the fly column in data.table in r
                            
                                Number of Unique Obs by Variable in a Data Table
                            
                                building nested lists in R
                            
                                Find eigenvector for a given eigenvalue R
                            
                                cost function in cv.glm of boot library in R
                            
                                Why can't I boxplot an xts directly?
                            
                                Converting XML to JSON using R
                            
                                Collapse vector to string of characters with respective numbers of consequtive occurences
                            
                                Create a function with whole columns as input and output
                            
                                Overlay violin plots ggplot2
                            
                                How can I read Mapinfo files in R
                            
                                Why is intersect(...) faster than data table join?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How can I use merge to cbind two dataframes

Tags:

merge

dataframe

r

user2763361

People also ask

1 Answers

datawookie

Recent Activity

Donate For Us