On the web, I found that rbind()
is used to combine two data frames by rows, and the same task is performed by bind_rows()
function from dplyr
.
What's the difference between these two functions, and which one is more efficient?
cbind() and rbind() both create matrices by combining several vectors of the same length. cbind() combines vectors as columns, while rbind() combines them as rows.
bind_rows() is half as fast as rbindlist() .
bind_rows() function in R Programming is used to combine rows of two data frames. Here in the above code, we created 3 data frames data1, data2, data3 with rows and columns in it and then we use bind_rows() function to combine the rows that were present in the data frame.
The name of the rbind R function stands for row-bind. The rbind function can be used to combine several vectors, matrices and/or data frames by rows.
Apart from few more differences, one of the main reasons for using bind_rows
over rbind
is to combine two data frames having different number of columns. rbind
throws an error in such a case whereas bind_rows
assigns "NA
" to those rows of columns missing in one of the data frames where the value is not provided by the data frames.
Try out the following code to see the difference:
a <- data.frame(a = 1:2, b = 3:4, c = 5:6)
b <- data.frame(a = 7:8, b = 2:3, c = 3:4, d = 8:9)
Results for the two calls are as follows:
rbind(a, b)
> rbind(a, b)
Error in rbind(deparse.level, ...) :
numbers of columns of arguments do not match
library(dplyr)
bind_rows(a, b)
> bind_rows(a, b)
a b c d
1 1 3 5 NA
2 2 4 6 NA
3 7 2 3 8
4 8 3 4 9
Since none of the answers here offers a systematic review of the differences between base::rbind
and dplyr::bind_rows
, and the answer from @bob regarding performance is incorrect, I decided to add the following.
Let's have some testing data frame:
df_1 = data.frame(
v1_dbl = 1:1000,
v2_lst = I(as.list(1:1000)),
v3_fct = factor(sample(letters[1:10], 1000, replace = TRUE)),
v4_raw = raw(1000),
v5_dtm = as.POSIXct(paste0("2019-12-0", sample(1:9, 1000, replace = TRUE)))
)
df_1$v2_lst = unclass(df_1$v2_lst) #remove the AsIs class introduced by `I()`
base::rbind
handles list inputs differentlyrbind(list(df_1, df_1))
[,1] [,2]
[1,] List,5 List,5
# You have to combine it with `do.call()` to achieve the same result:
head(do.call(rbind, list(df_1, df_1)), 3)
v1_dbl v2_lst v3_fct v4_raw v5_dtm
1 1 1 b 00 2019-12-02
2 2 2 h 00 2019-12-08
3 3 3 c 00 2019-12-09
head(dplyr::bind_rows(list(df_1, df_1)), 3)
v1_dbl v2_lst v3_fct v4_raw v5_dtm
1 1 1 b 00 2019-12-02
2 2 2 h 00 2019-12-08
3 3 3 c 00 2019-12-09
base::rbind
can cope with (some) mixed typesWhile both base::rbind
and dplyr::bind_rows
fail when trying to bind eg. raw or datetime column to a column of some other type, base::rbind
can cope with some degree of discrepancy.
Combining a list and a non-list column produces a list column. Combining a factor and something else produces a warning but not an error:
df_2 = data.frame(
v1_dbl = 1,
v2_lst = 1,
v3_fct = 1,
v4_raw = raw(1),
v5_dtm = as.POSIXct("2019-12-01")
)
head(rbind(df_1, df_2), 3)
v1_dbl v2_lst v3_fct v4_raw v5_dtm
1 1 1 b 00 2019-12-02
2 2 2 h 00 2019-12-08
3 3 3 c 00 2019-12-09
Warning message:
In `[<-.factor`(`*tmp*`, ri, value = 1) : invalid factor level, NA generated
# Fails on the lst, num combination:
head(dplyr::bind_rows(df_1, df_2), 3)
Error: Column `v2_lst` can't be converted from list to numeric
# Fails on the fct, num combination:
head(dplyr::bind_rows(df_1[-2], df_2), 3)
Error: Column `v3_fct` can't be converted from factor to numeric
base::rbind
keeps rownamesTidyverse advocates making rownames into a dedicated column, so its functions drop them.
rbind(mtcars[1:2, 1:4], mtcars[3:4, 1:4])
mpg cyl disp hp
Mazda RX4 21.0 6 160 110
Mazda RX4 Wag 21.0 6 160 110
Datsun 710 22.8 4 108 93
Hornet 4 Drive 21.4 6 258 110
dplyr::bind_rows(mtcars[1:2, 1:4], mtcars[3:4, 1:4])
mpg cyl disp hp
1 21.0 6 160 110
2 21.0 6 160 110
3 22.8 4 108 93
4 21.4 6 258 110
base::rbind
cannot cope with missing columnsJust for completeness, since Abhilash Kandwal already said so in their answer.
base::rbind
handles named arguments differentlyWhile base::rbind
prepends argument names to rownames, dplyr::bind_rows
has the option to add a dedicated ID column:
rbind(hi = mtcars[1:2, 1:4], bye = mtcars[3:4, 1:4])
mpg cyl disp hp
hi.Mazda RX4 21.0 6 160 110
hi.Mazda RX4 Wag 21.0 6 160 110
bye.Datsun 710 22.8 4 108 93
bye.Hornet 4 Drive 21.4 6 258 110
dplyr::bind_rows(hi = mtcars[1:2, 1:4], bye = mtcars[3:4, 1:4], .id = "my_id")
my_id mpg cyl disp hp
1 hi 21.0 6 160 110
2 hi 21.0 6 160 110
3 bye 22.8 4 108 93
4 bye 21.4 6 258 110
base::rbind
makes vector arguments into rows (and recycles them)In contrast, dplyr::bind_rows
adds columns (and therefore requires the elements of x to be named):
rbind(mtcars[1:2, 1:4], x = 1:2))
mpg cyl disp hp
Mazda RX4 21 6 160 110
Mazda RX4 Wag 21 6 160 110
x 1 2 1 2
dplyr::bind_rows(mtcars[1:2, 1:4], x = c(a = 1, b = 2))
mpg cyl disp hp a b
1 21 6 160 110 NA NA
2 21 6 160 110 NA NA
3 NA NA NA NA 1 2
base::rbind
is slower and requires more RAMTo bind a hundred medium-sized data frames (1k rows), base::rbind
requires fifty times more RAM and is more than 15 times slower:
dfs = rep(list(df_1), 100)
bench::mark(
"base::rbind" = do.call(rbind, dfs),
"dplyr::bind_rows" = dplyr::bind_rows(dfs)
)[, 1:5]
# A tibble: 2 x 5
expression min median `itr/sec` mem_alloc
<bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt>
1 base::rbind 47.23ms 48.05ms 20.0 104.48MB
2 dplyr::bind_rows 3.69ms 3.75ms 261. 2.39MB
Since I needed to bind lots of small data frames, here is a benchmark for that too. Both speed but especially RAM difference is quite striking:
dfs = rep(list(df_1[1:2, ]), 10^4)
bench::mark(
"base::rbind" = do.call(rbind, dfs),
"dplyr::bind_rows" = dplyr::bind_rows(dfs)
)[, 1:5]
# A tibble: 2 x 5
expression min median `itr/sec` mem_alloc
<bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt>
1 base::rbind 1.65s 1.65s 0.605 1.56GB
2 dplyr::bind_rows 19.31ms 20.21ms 43.7 566.69KB
Finally, help("rbind")
and help("bind_rows")
are interesting to read, too.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With