I remember reading somewhere that as.tibble()
is an alias for as_data_frame()
, but I don't know what exactly an alias is in programming terminology. Is it similar to a wrapper?
So I guess my question probably comes down to the difference in possible usages between tbl_df()
and as_data_frame()
: what are the differences between them, if any?
More specifically, given a (non-tibble) data frame df
, I often turn it into a tibble by using:
df <- tbl_df(df)
Wouldn't
df <- as_data_frame(df)
do the same thing? If so, are there other cases where the two functions tbl_df()
and as_data_frame()
can not be used interchangeably to get the same result?
The R documentation says that
tbl_df()
forwards the argument toas_data_frame()
does that mean that tbl_df()
is a wrapper or alias for as_data_frame()
? R documentation doesn't seem to say anything about as.tibble()
and I forgot where I read that it was an alias for as_data_frame()
. Also, apparently as_tibble()
is another alias for as_data_frame()
.
If these four functions really are all the same function, what is the sense in giving one function four different names? Isn't that more confusing than helpful?
Tibbles vs data frames There are two main differences in the usage of a data frame vs a tibble: printing, and subsetting. Tibbles have a refined print method that shows only the first 10 rows, and all the columns that fit on screen. This makes it much easier to work with large data.
tbl_df object is a data frame providing a nicer printing method, useful when working with large data sets. In this article, we'll present the tibble R package, developed by Hadley Wickham. The tibble R package provides easy to use functions for creating tibbles, which is a modern rethinking of data frames.
Tibble is the central data structure for the set of packages known as the tidyverse, including dplyr, ggplot2, tidyr, and readr. Description This is a convenient way to add one or more columns to an existing data frame.
Tibbles are data. frames that are lazy and surly: they do less (i.e. they don't change variable names or types, and don't do partial matching) and complain more (e.g. when a variable does not exist). This forces you to confront problems earlier, typically leading to cleaner, more expressive code.
To answer your question of "whether it is confusing", I think so :) .
as.tibble
and as_tibble
are the same; both simply call the S3 method as_tibble
:
> as.tibble
function (x, ...)
{
UseMethod("as_tibble")
}
<environment: namespace:tibble>
as_data_frame
and tbl_df
are not exactly the same; tbl_df
calls as_data_frame
:
> tbl_df
function (data)
{
as_data_frame(data)
}
<environment: namespace:dplyr>
Note tbl_df
is in dplyr
while as_data_frame
is in the tibble
package:
> as_data_frame
function (x, ...)
{
UseMethod("as_data_frame")
}
<environment: namespace:tibble>
but of course it calls the same function, so they are "the same", or aliases as you say.
Now, we can look at the differences between the generic methods as_tibble
and as_data_frame
. First, we look at the methods of each:
> methods(as_tibble)
[1] as_tibble.data.frame* as_tibble.default* as_tibble.list* as_tibble.matrix* as_tibble.NULL*
[6] as_tibble.poly* as_tibble.table* as_tibble.tbl_df* as_tibble.ts*
see '?methods' for accessing help and source code
> methods(as_data_frame)
[1] as_data_frame.data.frame* as_data_frame.default* as_data_frame.grouped_df* as_data_frame.list*
[5] as_data_frame.matrix* as_data_frame.NULL* as_data_frame.table* as_data_frame.tbl_cube*
[9] as_data_frame.tbl_df*
see '?methods' for accessing help and source code
If you check out the code for as_tibble
, you can see that the definitions for many of the as_data_frame
methods as well. as_tibble
defines two additional methods which aren't defined for as_data_frame
, as_tibble.ts
and as_tibble.poly
. I'm not really sure why they couldn't be also defined for as_data_frame
.
as_data_frame
has two additional methods, which are both defined in dplyr
: as_data_frame.tbl_cube
and as_data_frame.grouped_df
.
as_data_frame.tbl_cube
use the weaker checking of as.data.frame
(yes, bear with me) to then call as_data_frame
:
> getAnywhere(as_data_frame.tbl_cube)
function (x, ...)
{
as_data_frame(as.data.frame(x, ..., stringsAsFactors = FALSE))
}
<environment: namespace:dplyr>
while as_data_frame.grouped_df
ungroups the passed dataframe.
Overall, it seems that as_data_frame
should be seen as providing additional functionality over as_tibble
, unless you are dealing with ts
or poly
objects.
According to the introduction to tibble, it seems like tibbles supersede tbl_df
.
I’m pleased to announce tibble, a new package for manipulating and printing data frames in R. Tibbles are a modern reimagining of the data.frame, keeping what time has proven to be effective, and throwing out what is not. The name comes from dplyr: originally you created these objects with
tbl_df()
, which was most easily pronounced as “tibble diff”.[...]This package extracts out the tbl_df class associated functions from dplyr.
To add to the confusion, tbl_df
now calls as_tibble
, which is the preferred alias for as_data_frame
and as.tibble
: (Hadley Wickham's comment on the issue, and as_tibble docs)
> tbl_df
function (data)
{
as_tibble(data, .name_repair = "check_unique")
}
According to the help description of tbl_df()
, it is deprecated and tibble::as_tibble()
should be used instead. as_data_frame
and as.tibble
help pages both redirect to as_tibble
.
When calling class
on a tibble, the class name still shows up as tbl_df
:
> as_tibble(mtcars) %>% class
[1] "tbl_df" "tbl" "data.frame"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With