Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In the tidyverse, what is the difference between an object of class "tbl" and "tbl_df"?

Tags:

r

dplyr

tibble

When creating a tibble,

tbl <- tibble(A=1:5, B=6:10)

the result of

class(tbl)

is

[1] "tbl_df"     "tbl"        "data.frame"

I'm used to seeing this as I use dplyr quite a bit. But when is an object just a "tbl" (and not a "tbl_df") or vice versa? I'd just like to know a bit more about the difference, if any.

Any documentation would be much appreciated!

like image 952
Al R. Avatar asked Oct 25 '25 00:10

Al R.


1 Answers

You can think of a "tibble" as an interface. If an object can respond to all the tibble actions, then you can think of it as a tibble. R doesn't have strong typing.

So tbl is the generic tibble, and tbl_df is a specific type of tibble that basically stores it's data in a data.frame.

There are other packages like dtplyr that allow you to act like a tibble but store your data in a data.table. For example

library(dtplyr)
ds <- tbl_dt(mtcars)
class(ds)
# [1] "tbl_dt"     "tbl"        "data.table" "data.frame"

There's also the dbplyr package which allows you to use a SQL database back end. For example

library(dplyr)
con <- DBI::dbConnect(RSQLite::SQLite(), path = ":memory:")
copy_to(con, mtcars, "mtcars",temporary = FALSE)
cars_db <- tbl(con, "mtcars")
class(cars_db)
# [1] "tbl_dbi"  "tbl_sql"  "tbl_lazy" "tbl"  

So again we see that this thing generally can act as a tibble, but it has other classes that are there so that it can try to do all it's work in the database engine, rather than manipulating the data in R itself.

So there's not really a "difference" between tbl and tbl_df. The latter just says how the tibble is actually being implemented so the behavior can differ (be more optimized).

For more information, you can check out the tibble vignette or the extending tibble vignette

like image 190
MrFlick Avatar answered Oct 26 '25 16:10

MrFlick