On the web, I found that <code>rbind()</code> is used to combine two data frames by rows, and the same task is performed by <code>bind_rows()</code> function from <code>dplyr</code>. What's the difference between these two functions, and which one is more efficient?

Apart from few more differences, one of the main reasons for using <code>bind_rows</code> over <code>rbind</code> is to combine two data frames having different number of columns. <code>rbind</code> throws an error in such a case whereas <code>bind_rows</code> assigns "<code>NA</code>" to those rows of columns missing in one of the data frames where the value is not provided by the data frames. Try out the following code to see the difference: <pre class="prettyprint"><code>a <- data.frame(a = 1:2, b = 3:4, c = 5:6) b <- data.frame(a = 7:8, b = 2:3, c = 3:4, d = 8:9) </code></pre> Results for the two calls are as follows: <pre class="prettyprint"><code>rbind(a, b) > rbind(a, b) Error in rbind(deparse.level, ...) : numbers of columns of arguments do not match </code></pre> <pre class="prettyprint"><code>library(dplyr) bind_rows(a, b) > bind_rows(a, b) a b c d 1 1 3 5 NA 2 2 4 6 NA 3 7 2 3 8 4 8 3 4 9 </code></pre>

Difference between rbind() and bind_rows() in R

2 Answers

Apart from few more differences, one of the main reasons for using bind_rows over rbind is to combine two data frames having different number of columns. rbind throws an error in such a case whereas bind_rows assigns "NA" to those rows of columns missing in one of the data frames where the value is not provided by the data frames.

Try out the following code to see the difference:

a <- data.frame(a = 1:2, b = 3:4, c = 5:6)
b <- data.frame(a = 7:8, b = 2:3, c = 3:4, d = 8:9)

Results for the two calls are as follows:

rbind(a, b)
> rbind(a, b)
Error in rbind(deparse.level, ...) : 
  numbers of columns of arguments do not match

library(dplyr)
bind_rows(a, b)
> bind_rows(a, b)
  a b c  d
1 1 3 5 NA
2 2 4 6 NA
3 7 2 3  8
4 8 3 4  9

103

answered Oct 21 '22 14:10

Abhilash Kandwal

Since none of the answers here offers a systematic review of the differences between base::rbind and dplyr::bind_rows, and the answer from @bob regarding performance is incorrect, I decided to add the following.

Let's have some testing data frame:

df_1 = data.frame(
  v1_dbl = 1:1000,
  v2_lst = I(as.list(1:1000)),
  v3_fct = factor(sample(letters[1:10], 1000, replace = TRUE)),
  v4_raw = raw(1000),
  v5_dtm = as.POSIXct(paste0("2019-12-0", sample(1:9, 1000, replace = TRUE)))
)

df_1$v2_lst = unclass(df_1$v2_lst) #remove the AsIs class introduced by `I()`

1. `base::rbind` handles list inputs differently

rbind(list(df_1, df_1))
     [,1]   [,2]  
[1,] List,5 List,5

# You have to combine it with `do.call()` to achieve the same result:
head(do.call(rbind, list(df_1, df_1)), 3)
  v1_dbl v2_lst v3_fct v4_raw     v5_dtm
1      1      1      b     00 2019-12-02
2      2      2      h     00 2019-12-08
3      3      3      c     00 2019-12-09

head(dplyr::bind_rows(list(df_1, df_1)), 3)
  v1_dbl v2_lst v3_fct v4_raw     v5_dtm
1      1      1      b     00 2019-12-02
2      2      2      h     00 2019-12-08
3      3      3      c     00 2019-12-09

2. `base::rbind` can cope with (some) mixed types

While both base::rbind and dplyr::bind_rows fail when trying to bind eg. raw or datetime column to a column of some other type, base::rbind can cope with some degree of discrepancy.

Combining a list and a non-list column produces a list column. Combining a factor and something else produces a warning but not an error:

df_2 = data.frame(
  v1_dbl = 1,
  v2_lst = 1,
  v3_fct = 1,
  v4_raw = raw(1),
  v5_dtm = as.POSIXct("2019-12-01")
)

head(rbind(df_1, df_2), 3)
  v1_dbl v2_lst v3_fct v4_raw     v5_dtm
1      1      1      b     00 2019-12-02
2      2      2      h     00 2019-12-08
3      3      3      c     00 2019-12-09
Warning message:
In `[<-.factor`(`*tmp*`, ri, value = 1) : invalid factor level, NA generated

# Fails on the lst, num combination:
head(dplyr::bind_rows(df_1, df_2), 3)
Error: Column `v2_lst` can't be converted from list to numeric

# Fails on the fct, num combination:
head(dplyr::bind_rows(df_1[-2], df_2), 3)
Error: Column `v3_fct` can't be converted from factor to numeric

3. `base::rbind` keeps rownames

Tidyverse advocates making rownames into a dedicated column, so its functions drop them.

rbind(mtcars[1:2, 1:4], mtcars[3:4, 1:4])
                mpg cyl disp  hp
Mazda RX4      21.0   6  160 110
Mazda RX4 Wag  21.0   6  160 110
Datsun 710     22.8   4  108  93
Hornet 4 Drive 21.4   6  258 110

dplyr::bind_rows(mtcars[1:2, 1:4], mtcars[3:4, 1:4])
   mpg cyl disp  hp
1 21.0   6  160 110
2 21.0   6  160 110
3 22.8   4  108  93
4 21.4   6  258 110

4. `base::rbind` cannot cope with missing columns

Just for completeness, since Abhilash Kandwal already said so in their answer.

5. `base::rbind` handles named arguments differently

While base::rbind prepends argument names to rownames, dplyr::bind_rows has the option to add a dedicated ID column:

rbind(hi = mtcars[1:2, 1:4], bye = mtcars[3:4, 1:4])
                    mpg cyl disp  hp
hi.Mazda RX4       21.0   6  160 110
hi.Mazda RX4 Wag   21.0   6  160 110
bye.Datsun 710     22.8   4  108  93
bye.Hornet 4 Drive 21.4   6  258 110

dplyr::bind_rows(hi = mtcars[1:2, 1:4], bye = mtcars[3:4, 1:4], .id = "my_id")
  my_id  mpg cyl disp  hp
1    hi 21.0   6  160 110
2    hi 21.0   6  160 110
3   bye 22.8   4  108  93
4   bye 21.4   6  258 110

6. `base::rbind` makes vector arguments into rows (and recycles them)

In contrast, dplyr::bind_rows adds columns (and therefore requires the elements of x to be named):

rbind(mtcars[1:2, 1:4], x = 1:2))
              mpg cyl disp  hp
Mazda RX4      21   6  160 110
Mazda RX4 Wag  21   6  160 110
x               1   2    1   2

dplyr::bind_rows(mtcars[1:2, 1:4], x = c(a = 1, b = 2))
  mpg cyl disp  hp  a  b
1  21   6  160 110 NA NA
2  21   6  160 110 NA NA
3  NA  NA   NA  NA  1  2

7. `base::rbind` is slower and requires more RAM

To bind a hundred medium-sized data frames (1k rows), base::rbind requires fifty times more RAM and is more than 15 times slower:

dfs = rep(list(df_1), 100)
bench::mark(
  "base::rbind" = do.call(rbind, dfs),
  "dplyr::bind_rows" = dplyr::bind_rows(dfs)
)[, 1:5]

# A tibble: 2 x 5
  expression            min   median `itr/sec` mem_alloc
  <bch:expr>       <bch:tm> <bch:tm>     <dbl> <bch:byt>
1 base::rbind       47.23ms  48.05ms      20.0  104.48MB
2 dplyr::bind_rows   3.69ms   3.75ms     261.     2.39MB

Since I needed to bind lots of small data frames, here is a benchmark for that too. Both speed but especially RAM difference is quite striking:

dfs = rep(list(df_1[1:2, ]), 10^4)
bench::mark(
  "base::rbind" = do.call(rbind, dfs),
  "dplyr::bind_rows" = dplyr::bind_rows(dfs)
)[, 1:5]

# A tibble: 2 x 5
  expression            min   median `itr/sec` mem_alloc
  <bch:expr>       <bch:tm> <bch:tm>     <dbl> <bch:byt>
1 base::rbind         1.65s    1.65s     0.605    1.56GB
2 dplyr::bind_rows  19.31ms  20.21ms    43.7    566.69KB

Finally, help("rbind") and help("bind_rows") are interesting to read, too.

answered Oct 21 '22 15:10

jakub

Related questions
                            
                                R ggplot2 legend inside the figure
                            
                                Equivalent of Paste R to Python
                            
                                What does the error "object not interpretable as a factor" mean? [closed]
                            
                                How do I run a high pass or low pass filter on data points in R?
                            
                                What does the R formula y~1 mean?
                            
                                How can I use spell check in Rmarkdown?
                            
                                Concatenate rows of a data frame
                            
                                R: gsub, pattern = vector and replacement = vector
                            
                                Print number as reduced fraction in R
                            
                                How to merge two columns in R with a specific symbol?
                            
                                Named List To/From Data.Frame
                            
                                How to create a Marimekko/Mosaic plot in ggplot2
                            
                                Sort a factor based on value in one or more other columns
                            
                                Displaying a greater than or equal sign
                            
                                How to plot one variable in ggplot?
                            
                                R force local scope
                            
                                How to suppress warning messages when loading a library?
                            
                                'Reset inputs' button in shiny app
                            
                                How do I convert certain columns of a data frame to become factors? [duplicate]
                            
                                Adding space between bars in ggplot2

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Difference between rbind() and bind_rows() in R

Tags:

r

dplyr

rbind

asad_hussain

People also ask

2 Answers

Abhilash Kandwal

1. `base::rbind` handles list inputs differently

2. `base::rbind` can cope with (some) mixed types

3. `base::rbind` keeps rownames

4. `base::rbind` cannot cope with missing columns

5. `base::rbind` handles named arguments differently

6. `base::rbind` makes vector arguments into rows (and recycles them)

7. `base::rbind` is slower and requires more RAM

jakub

Recent Activity

Donate For Us

Difference between rbind() and bind_rows() in R

Tags:

r

dplyr

rbind

asad_hussain

People also ask

2 Answers

Abhilash Kandwal

1. base::rbind handles list inputs differently

2. base::rbind can cope with (some) mixed types

3. base::rbind keeps rownames

4. base::rbind cannot cope with missing columns

5. base::rbind handles named arguments differently

6. base::rbind makes vector arguments into rows (and recycles them)

7. base::rbind is slower and requires more RAM

jakub

Related questions

Recent Activity

Donate For Us

1. `base::rbind` handles list inputs differently

2. `base::rbind` can cope with (some) mixed types

3. `base::rbind` keeps rownames

4. `base::rbind` cannot cope with missing columns

5. `base::rbind` handles named arguments differently

6. `base::rbind` makes vector arguments into rows (and recycles them)

7. `base::rbind` is slower and requires more RAM