Suppose I have two dataframes:
df1 <- data.frame(matrix(rnorm(10*10),ncol=10))
df2 <- data.frame(matrix(rnorm(10*10),ncol=10))
colnames(df1) <- 1:10
colnames(df2) <- 11:20
How do I use merge
to cbind
these (I already know about cbind
but I am interested in the application of merge
here).
cbind() and rbind() both create matrices by combining several vectors of the same length. cbind() combines vectors as columns, while rbind() combines them as rows.
The concat() function can be used to concatenate two Dataframes by adding the rows of one to the other. The merge() function is equivalent to the SQL JOIN clause. 'left', 'right' and 'inner' joins are all possible.
We can use either pandas. merge() or DataFrame. merge() to merge multiple Dataframes. Merging multiple Dataframes is similar to SQL join and supports different types of join inner , left , right , outer , cross .
cbind() function in R Language is used to combine specified Vector, Matrix or Data Frame by columns. deparse. level: This value determines how the column names generated. The default value of deparse.
I have made the matrices smaller for display purposes.
> df1 <- data.frame(matrix(rnorm(15),ncol=3))
> df2 <- data.frame(matrix(rnorm(15),ncol=3))
> colnames(df1) <- paste0("A", 1:3)
> colnames(df2) <- paste0("B", 4:6)
We have data frames df1 and df2, with columns names A1, A2 & A3 and B1, B2 & B3 respectively.
So,as you know, cbind() just sticks the data frames together, side by side.
> cbind(df1, df2)
A1 A2 A3 B4 B5 B6
1 2.055780 0.362796 1.25536 -1.748416 0.41855 -0.516635
2 0.010779 0.086778 -0.68413 1.183762 -1.20362 0.041147
3 -0.732393 0.235125 -0.89306 1.435362 -0.26066 -0.025933
4 -2.493843 -2.654263 0.36107 0.083018 -0.82251 -0.991135
5 0.935540 0.398196 -0.43043 0.470559 -0.54146 1.955555
merge() looks for common columns. In this case there are none, so it produces essentially an outer product in which each row of df1 is matched against each row of df2.
> merge(df1, df2)
A1 A2 A3 B4 B5 B6
1 2.055780 0.362796 1.25536 -1.748416 0.41855 -0.516635
2 0.010779 0.086778 -0.68413 -1.748416 0.41855 -0.516635
3 -0.732393 0.235125 -0.89306 -1.748416 0.41855 -0.516635
4 -2.493843 -2.654263 0.36107 -1.748416 0.41855 -0.516635
5 0.935540 0.398196 -0.43043 -1.748416 0.41855 -0.516635
6 2.055780 0.362796 1.25536 1.183762 -1.20362 0.041147
7 0.010779 0.086778 -0.68413 1.183762 -1.20362 0.041147
8 -0.732393 0.235125 -0.89306 1.183762 -1.20362 0.041147
9 -2.493843 -2.654263 0.36107 1.183762 -1.20362 0.041147
10 0.935540 0.398196 -0.43043 1.183762 -1.20362 0.041147
11 2.055780 0.362796 1.25536 1.435362 -0.26066 -0.025933
12 0.010779 0.086778 -0.68413 1.435362 -0.26066 -0.025933
13 -0.732393 0.235125 -0.89306 1.435362 -0.26066 -0.025933
14 -2.493843 -2.654263 0.36107 1.435362 -0.26066 -0.025933
15 0.935540 0.398196 -0.43043 1.435362 -0.26066 -0.025933
16 2.055780 0.362796 1.25536 0.083018 -0.82251 -0.991135
17 0.010779 0.086778 -0.68413 0.083018 -0.82251 -0.991135
18 -0.732393 0.235125 -0.89306 0.083018 -0.82251 -0.991135
19 -2.493843 -2.654263 0.36107 0.083018 -0.82251 -0.991135
20 0.935540 0.398196 -0.43043 0.083018 -0.82251 -0.991135
21 2.055780 0.362796 1.25536 0.470559 -0.54146 1.955555
22 0.010779 0.086778 -0.68413 0.470559 -0.54146 1.955555
23 -0.732393 0.235125 -0.89306 0.470559 -0.54146 1.955555
24 -2.493843 -2.654263 0.36107 0.470559 -0.54146 1.955555
25 0.935540 0.398196 -0.43043 0.470559 -0.54146 1.955555
If we rename the first column in df1 so that it matches the name of the first column in df2 then merge() looks for common values in those two columns. Since there are no common values, the output is empty.
> colnames(df1)[1] = "B4"
> merge(df1, df2)
[1] B4 A2 A3 B5 B6
<0 rows> (or 0-length row.names)
But now if we copy (and reverse, just to make things interesting!) the first column of df2 into the first column of df1...
> df1$B4 = rev(df2$B4)
> df1
B4 A2 A3
1 -0.50647 -0.48128 0.540799
2 -0.70684 -0.35401 0.872514
3 0.14341 1.12184 -0.079913
4 -0.59989 0.81912 1.726494
5 0.33864 0.85277 0.386702
> df2
B4 B5 B6
1 0.33864 1.83677 0.406717
2 -0.59989 -0.43630 0.075029
3 0.14341 1.01496 0.095534
4 -0.70684 1.32414 -0.122613
5 -0.50647 0.70709 -0.700225
... and try to merge again...
> merge(df1, df2)
B4 A2 A3 B5 B6
1 -0.70684 -0.35401 0.872514 1.32414 -0.122613
2 -0.59989 0.81912 1.726494 -0.43630 0.075029
3 -0.50647 -0.48128 0.540799 0.70709 -0.700225
4 0.14341 1.12184 -0.079913 1.01496 0.095534
5 0.33864 0.85277 0.386702 1.83677 0.406717
... we finally get something meaningful: the rows of df1 and df2 are stuck together according to values in df1$B4 matching values in df2$B4.
I hope that this helps.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With