Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

After doing bind_rows() and rbind() on same data.tables , identical() = FALSE?

Caveat: novice. I have several data.tables with millions of rows each, variables are mostly dates and factors. I was using rbindlist() to combine them because. Yesterday, after breaking up the tables into smaller pieces vertically (instead of the current horizontal splicing), I was trying to understand rbind better (especially with fill = TRUE) and also tried bind_rows() and then tried to verify the results but identical() returned FALSE.

library(data.table)
library(dplyr)
DT1 <- data.table(a=1, b=2)
DT2 <- data.table(a=4, b=3)
DT_bindrows <- bind_rows(DT1,DT2)
DT_rbind <- rbind(DT1,DT2)
identical(DT_bindrows,DT_rbind)
 # [1] FALSE

Visually inspecting the results from bind_rows() and rbind() says they are indeed identical. I read this and this (from where I adapted the example). My question: (a) what am I missing, and (b) if the number, names, and order of my columns is the same, should I be concerned that identical() = FALSE?

like image 491
armipunk Avatar asked Jul 25 '18 16:07

armipunk


People also ask

What is the difference between Rbind and bind_rows?

rbind() throws an error whereas bind_rows assigns “NA” to those rows of columns missing in one of the data frames where the value is not provided by the data frames.

Is bind_rows faster than Rbind?

We go back to initializing the output data frame with no rows and then adding the result of each turn of the loop as a new row in the output data frame. But when you now use bind_rows() rather than rbind() it runs much, much faster.

How do I Rbind two data frames in R?

To join two data frames (datasets) vertically, use the rbind function. The two data frames must have the same variables, but they do not have to be in the same order. If data frameA has variables that data frameB does not, then either: Delete the extra variables in data frameA or.

What is the meaning of the Rbind ()?

The name of the rbind R function stands for row-bind. The rbind function can be used to combine several vectors, matrices and/or data frames by rows.


1 Answers

The identical checks for attributes which are not the same. With all.equal, there is an option not to check the attributes (check.attributes)

all.equal(DT_bindrows, DT_rbind, check.attributes = FALSE)
#[1] TRUE

If we check the str of both the datasets, it becomes clear

str(DT_bindrows)
#Classes ‘data.table’ and 'data.frame': 2 obs. of  2 #variables:
# $ a: num  1 4
# $ b: num  2 3
str(DT_rbind)
#Classes ‘data.table’ and 'data.frame': 2 obs. of  2 #variables:
# $ a: num  1 4
# $ b: num  2 3
# - attr(*, ".internal.selfref")=<externalptr> # reference attribute 

By assigning the attribute to NULL, the identical returns TRUE

attr(DT_rbind, ".internal.selfref") <- NULL
identical(DT_bindrows, DT_rbind)
#[1] TRUE
like image 133
akrun Avatar answered Oct 20 '22 10:10

akrun