Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I use merge to cbind two dataframes

Tags:

merge

dataframe

r

Suppose I have two dataframes:

df1 <- data.frame(matrix(rnorm(10*10),ncol=10))
df2 <- data.frame(matrix(rnorm(10*10),ncol=10))
colnames(df1) <- 1:10
colnames(df2) <- 11:20

How do I use merge to cbind these (I already know about cbind but I am interested in the application of merge here).

like image 764
user2763361 Avatar asked Dec 16 '13 03:12

user2763361


People also ask

What is the use of Rbind () and Cbind () in R explain with example?

cbind() and rbind() both create matrices by combining several vectors of the same length. cbind() combines vectors as columns, while rbind() combines them as rows.

How do I merge two data frames?

The concat() function can be used to concatenate two Dataframes by adding the rows of one to the other. The merge() function is equivalent to the SQL JOIN clause. 'left', 'right' and 'inner' joins are all possible.

Can you merge more than 2 Dataframes?

We can use either pandas. merge() or DataFrame. merge() to merge multiple Dataframes. Merging multiple Dataframes is similar to SQL join and supports different types of join inner , left , right , outer , cross .

What is the use of Cbind () function?

cbind() function in R Language is used to combine specified Vector, Matrix or Data Frame by columns. deparse. level: This value determines how the column names generated. The default value of deparse.


1 Answers

I have made the matrices smaller for display purposes.

> df1 <- data.frame(matrix(rnorm(15),ncol=3))
> df2 <- data.frame(matrix(rnorm(15),ncol=3))
> colnames(df1) <- paste0("A", 1:3)
> colnames(df2) <- paste0("B", 4:6)

We have data frames df1 and df2, with columns names A1, A2 & A3 and B1, B2 & B3 respectively.

So,as you know, cbind() just sticks the data frames together, side by side.

> cbind(df1, df2)
         A1        A2       A3        B4       B5        B6
1  2.055780  0.362796  1.25536 -1.748416  0.41855 -0.516635
2  0.010779  0.086778 -0.68413  1.183762 -1.20362  0.041147
3 -0.732393  0.235125 -0.89306  1.435362 -0.26066 -0.025933
4 -2.493843 -2.654263  0.36107  0.083018 -0.82251 -0.991135
5  0.935540  0.398196 -0.43043  0.470559 -0.54146  1.955555

merge() looks for common columns. In this case there are none, so it produces essentially an outer product in which each row of df1 is matched against each row of df2.

> merge(df1, df2)
          A1        A2       A3        B4       B5        B6
1   2.055780  0.362796  1.25536 -1.748416  0.41855 -0.516635
2   0.010779  0.086778 -0.68413 -1.748416  0.41855 -0.516635
3  -0.732393  0.235125 -0.89306 -1.748416  0.41855 -0.516635
4  -2.493843 -2.654263  0.36107 -1.748416  0.41855 -0.516635
5   0.935540  0.398196 -0.43043 -1.748416  0.41855 -0.516635
6   2.055780  0.362796  1.25536  1.183762 -1.20362  0.041147
7   0.010779  0.086778 -0.68413  1.183762 -1.20362  0.041147
8  -0.732393  0.235125 -0.89306  1.183762 -1.20362  0.041147
9  -2.493843 -2.654263  0.36107  1.183762 -1.20362  0.041147
10  0.935540  0.398196 -0.43043  1.183762 -1.20362  0.041147
11  2.055780  0.362796  1.25536  1.435362 -0.26066 -0.025933
12  0.010779  0.086778 -0.68413  1.435362 -0.26066 -0.025933
13 -0.732393  0.235125 -0.89306  1.435362 -0.26066 -0.025933
14 -2.493843 -2.654263  0.36107  1.435362 -0.26066 -0.025933
15  0.935540  0.398196 -0.43043  1.435362 -0.26066 -0.025933
16  2.055780  0.362796  1.25536  0.083018 -0.82251 -0.991135
17  0.010779  0.086778 -0.68413  0.083018 -0.82251 -0.991135
18 -0.732393  0.235125 -0.89306  0.083018 -0.82251 -0.991135
19 -2.493843 -2.654263  0.36107  0.083018 -0.82251 -0.991135
20  0.935540  0.398196 -0.43043  0.083018 -0.82251 -0.991135
21  2.055780  0.362796  1.25536  0.470559 -0.54146  1.955555
22  0.010779  0.086778 -0.68413  0.470559 -0.54146  1.955555
23 -0.732393  0.235125 -0.89306  0.470559 -0.54146  1.955555
24 -2.493843 -2.654263  0.36107  0.470559 -0.54146  1.955555
25  0.935540  0.398196 -0.43043  0.470559 -0.54146  1.955555

If we rename the first column in df1 so that it matches the name of the first column in df2 then merge() looks for common values in those two columns. Since there are no common values, the output is empty.

> colnames(df1)[1] = "B4"
> merge(df1, df2)
[1] B4 A2 A3 B5 B6
<0 rows> (or 0-length row.names)

But now if we copy (and reverse, just to make things interesting!) the first column of df2 into the first column of df1...

> df1$B4 = rev(df2$B4)
> df1
        B4       A2        A3
1 -0.50647 -0.48128  0.540799
2 -0.70684 -0.35401  0.872514
3  0.14341  1.12184 -0.079913
4 -0.59989  0.81912  1.726494
5  0.33864  0.85277  0.386702
> df2
        B4       B5        B6
1  0.33864  1.83677  0.406717
2 -0.59989 -0.43630  0.075029
3  0.14341  1.01496  0.095534
4 -0.70684  1.32414 -0.122613
5 -0.50647  0.70709 -0.700225

... and try to merge again...

> merge(df1, df2)
        B4       A2        A3       B5        B6
1 -0.70684 -0.35401  0.872514  1.32414 -0.122613
2 -0.59989  0.81912  1.726494 -0.43630  0.075029
3 -0.50647 -0.48128  0.540799  0.70709 -0.700225
4  0.14341  1.12184 -0.079913  1.01496  0.095534
5  0.33864  0.85277  0.386702  1.83677  0.406717

... we finally get something meaningful: the rows of df1 and df2 are stuck together according to values in df1$B4 matching values in df2$B4.

I hope that this helps.

like image 88
datawookie Avatar answered Sep 27 '22 22:09

datawookie