My data.frame DATA
is
k l g
1 A 2004 12
2 B 2004 3.4
3 C 2004 4.5
Another data.frame DATA2
is
i d t
1 A 2012 22
2 B 2012 4.8
3 C 2012 5.6
I want to get
1 A 2004 12
1 A 2012 22
2 B 2004 3.4
2 B 2012 4.8
3 C 2004 4.5
3 C 2012 5.6
The concat() function can be used to concatenate two Dataframes by adding the rows of one to the other. The merge() function is equivalent to the SQL JOIN clause. 'left', 'right' and 'inner' joins are all possible.
Pandas DataFrame merge() Function Syntax These are similar to SQL left outer join, right outer join, full outer join, and inner join. on: Column or index level names to join on. These columns must be present in both the DataFrames. If not provided, the intersection of the columns in both DataFrames are used.
To join two data frames (datasets) vertically, use the rbind function. The two data frames must have the same variables, but they do not have to be in the same order. If data frameA has variables that data frameB does not, then either: Delete the extra variables in data frameA or.
In R we use merge() function to merge two dataframes in R. This function is present inside join() function of dplyr package. The most important condition for joining two dataframes is that the column type should be the same on which the merging happens. merge() function works similarly like join in DBMS.
We can try rbindlist
from data.table
. Place the datasets in a list
, rbind
them with rbindlist
and order
by the first column.
library(data.table)
rbindlist(list(df1, df2))[order(k)]
# k l g
#1: A 2004 12.0
#2: A 2012 22.0
#3: B 2004 3.4
#4: B 2012 4.8
#5: C 2004 4.5
#6: C 2012 5.6
Or using dplyr
library(dplyr)
bind_rows(df1, setNames(df2, names(df1))) %>%
arrange(k)
NOTE: I used df1
and df2
in place of DATA
and DATA2
as object names as it is easier to type.
You can try the interleave
function from the "gdata" package. However, this would require your inputs have the same column names and have the same number of rows.
The approach would be:
library(gdata) # for interleave
do.call(interleave, lapply(list(df1, df2), setNames, paste0("V", 1:ncol(df1))))
# V1 V2 V3
# 1 A 2004 12.0
# 11 A 2012 22.0
# 2 B 2004 3.4
# 21 B 2012 4.8
# 3 C 2004 4.5
# 31 C 2012 5.6
Alternatively, as mentioned in my comment @akrun's answer, depending on whether the first column is a grouping variable or not, you may want to modify his approach a little.
For instance, imagine there were a third data.frame
, with a different number of rows than the others. interleave
would not work on that, but the rbindlist
approach would.
df3 <- do.call(rbind, lapply(list(df1, df2), setNames, c("A", "B", "Z")))
rbindlist(list(df1, df2, df3), idcol = TRUE)[, N := sequence(.N), by = .id][order(N)]
# .id k l g N
# 1: 1 A 2004 12.0 1
# 2: 2 A 2012 22.0 1
# 3: 3 A 2004 12.0 1
# 4: 1 B 2004 3.4 2
# 5: 2 B 2012 4.8 2
# 6: 3 B 2004 3.4 2
# 7: 1 C 2004 4.5 3
# 8: 2 C 2012 5.6 3
# 9: 3 C 2004 4.5 3
# 10: 3 A 2012 22.0 4
# 11: 3 B 2012 4.8 5
# 12: 3 C 2012 5.6 6
Pay specific attention to the last three rows in comparison with @akrun's approach.
The equivalent in base R for that last "data.table" approach would be something like:
x <- do.call(rbind, lapply(c("df1", "df2", "df3"), function(x) {
setNames(cbind(rn = x, get(x)), c("id", paste0("V", 1:ncol(get(x)))))
}))
x[order(ave(as.numeric(x$id), x$id, FUN = seq_along)), ]
(So the moral is, use "data.table".)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With