Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the difference between as.tibble(), as_data_frame(), and tbl_df()?

I remember reading somewhere that as.tibble() is an alias for as_data_frame(), but I don't know what exactly an alias is in programming terminology. Is it similar to a wrapper?

So I guess my question probably comes down to the difference in possible usages between tbl_df() and as_data_frame(): what are the differences between them, if any?

More specifically, given a (non-tibble) data frame df, I often turn it into a tibble by using:

df <- tbl_df(df)

Wouldn't

df <- as_data_frame(df)

do the same thing? If so, are there other cases where the two functions tbl_df() and as_data_frame() can not be used interchangeably to get the same result?

The R documentation says that

tbl_df() forwards the argument to as_data_frame()

does that mean that tbl_df() is a wrapper or alias for as_data_frame()? R documentation doesn't seem to say anything about as.tibble() and I forgot where I read that it was an alias for as_data_frame(). Also, apparently as_tibble() is another alias for as_data_frame().

If these four functions really are all the same function, what is the sense in giving one function four different names? Isn't that more confusing than helpful?

like image 263
Chill2Macht Avatar asked May 12 '17 16:05

Chill2Macht


People also ask

What is the difference between a Tibble and a Dataframe in R?

Tibbles vs data frames There are two main differences in the usage of a data frame vs a tibble: printing, and subsetting. Tibbles have a refined print method that shows only the first 10 rows, and all the columns that fit on screen. This makes it much easier to work with large data.

What does Tbl_df do in R?

tbl_df object is a data frame providing a nicer printing method, useful when working with large data sets. In this article, we'll present the tibble R package, developed by Hadley Wickham. The tibble R package provides easy to use functions for creating tibbles, which is a modern rethinking of data frames.

What is the function of Tibble in R?

Tibble is the central data structure for the set of packages known as the tidyverse, including dplyr, ggplot2, tidyr, and readr. Description This is a convenient way to add one or more columns to an existing data frame.

What is a Tibble data frame R?

Tibbles are data. frames that are lazy and surly: they do less (i.e. they don't change variable names or types, and don't do partial matching) and complain more (e.g. when a variable does not exist). This forces you to confront problems earlier, typically leading to cleaner, more expressive code.


2 Answers

To answer your question of "whether it is confusing", I think so :) .

as.tibble and as_tibble are the same; both simply call the S3 method as_tibble:

> as.tibble
function (x, ...) 
{
    UseMethod("as_tibble")
}
<environment: namespace:tibble>

as_data_frame and tbl_df are not exactly the same; tbl_df calls as_data_frame:

> tbl_df
function (data) 
{
    as_data_frame(data)
}
<environment: namespace:dplyr>

Note tbl_df is in dplyr while as_data_frame is in the tibble package:

> as_data_frame
function (x, ...) 
{
    UseMethod("as_data_frame")
}
<environment: namespace:tibble>

but of course it calls the same function, so they are "the same", or aliases as you say.

Now, we can look at the differences between the generic methods as_tibble and as_data_frame. First, we look at the methods of each:

> methods(as_tibble)
[1] as_tibble.data.frame* as_tibble.default*    as_tibble.list* as_tibble.matrix*     as_tibble.NULL*      
[6] as_tibble.poly*       as_tibble.table*      as_tibble.tbl_df* as_tibble.ts*        
see '?methods' for accessing help and source code
> methods(as_data_frame)
[1] as_data_frame.data.frame* as_data_frame.default*  as_data_frame.grouped_df* as_data_frame.list*      
[5] as_data_frame.matrix*     as_data_frame.NULL*       as_data_frame.table*      as_data_frame.tbl_cube*  
[9] as_data_frame.tbl_df*    
see '?methods' for accessing help and source code

If you check out the code for as_tibble, you can see that the definitions for many of the as_data_frame methods as well. as_tibble defines two additional methods which aren't defined for as_data_frame, as_tibble.ts and as_tibble.poly. I'm not really sure why they couldn't be also defined for as_data_frame.

as_data_frame has two additional methods, which are both defined in dplyr: as_data_frame.tbl_cube and as_data_frame.grouped_df.

as_data_frame.tbl_cube use the weaker checking of as.data.frame (yes, bear with me) to then call as_data_frame:

> getAnywhere(as_data_frame.tbl_cube)
function (x, ...) 
{
    as_data_frame(as.data.frame(x, ..., stringsAsFactors = FALSE))
}
<environment: namespace:dplyr>

while as_data_frame.grouped_df ungroups the passed dataframe.

Overall, it seems that as_data_frame should be seen as providing additional functionality over as_tibble, unless you are dealing with ts or poly objects.

like image 189
rsmith54 Avatar answered Oct 05 '22 06:10

rsmith54


According to the introduction to tibble, it seems like tibbles supersede tbl_df.

I’m pleased to announce tibble, a new package for manipulating and printing data frames in R. Tibbles are a modern reimagining of the data.frame, keeping what time has proven to be effective, and throwing out what is not. The name comes from dplyr: originally you created these objects with tbl_df(), which was most easily pronounced as “tibble diff”.

[...]This package extracts out the tbl_df class associated functions from dplyr.

To add to the confusion, tbl_df now calls as_tibble, which is the preferred alias for as_data_frame and as.tibble: (Hadley Wickham's comment on the issue, and as_tibble docs)

> tbl_df
function (data) 
{
    as_tibble(data, .name_repair = "check_unique")
}

According to the help description of tbl_df(), it is deprecated and tibble::as_tibble() should be used instead. as_data_frame and as.tibble help pages both redirect to as_tibble.

When calling class on a tibble, the class name still shows up as tbl_df:

> as_tibble(mtcars) %>% class
[1] "tbl_df"     "tbl"        "data.frame"
like image 41
qwr Avatar answered Oct 05 '22 05:10

qwr