Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What can a data frame do that a tibble cannot?

Fans of the Tidyverse regularly give several advantages of using tibbles rather than data frames. Most of them seem designed to protect the user from making mistakes. For example, unlike data frames, tibbles:

  • Don't need a ,drop=FALSE argument to not drop dimensions from your data.
  • Will not let the $ operator do partial matching for column names.
  • Only recycle your input vectors if they are of exactly length one.

I'm steadily becoming convinced to replace all of my data frames with tibbles. What are the primary disadvantages of doing so? More specifically, what can a data frame do that a tibble cannot?

Preemptively, I would like to make it clear that I am not asking about data.table or any big-picture objections to the Tidyverse. I am strictly asking about tibbles and data frames.

like image 288
J. Mini Avatar asked Mar 03 '21 23:03

J. Mini


People also ask

What is the difference between data frame and Tibble?

1. Tibble displays data along with data type while displaying whereas data frame does not. 2. Tibble fetches data using data source in its original form instead of data frame such factors, characters or numeric.

What are some advantages of Tibbles compared to data frames?

Advantages of tibbles compared to data framesTibbles have nice printing method that show only the first 10 rows and all the columns that fit on the screen. This is useful when you work with large data sets. When printed, the data type of each column is specified (see below):

Is a Tibble a DataFrame?

“Tibbles” are a new modern data frame. It keeps many important features of the original data frame. It removes many of the outdated features. They are another amazing feature added to R by Hadley Wickham.


1 Answers

From the trouble with tibbles, you can read :

there isn’t really any trouble with tibbles

However,

Some older packages don’t work with tibbles because of their alternative subsetting method. They expect tib[,1] to return a vector, when in fact it will now return another tibble.

This is what @Henrik pointed out in comments.

As an example, the length function won't return the same result:

library(tibble)
tibblecars <- as_tibble(mtcars)
tibblecars[,"cyl"]
#> # A tibble: 32 x 1
#>      cyl
#>    <dbl>
#>  1     6
#>  2     6
#>  3     4
#>  4     6
#>  5     8
#>  6     6
#>  7     8
#>  8     4
#>  9     4
#> 10     6
#> # ... with 22 more rows
length(tibblecars[,"cyl"])
#> [1] 1
mtcars[,"cyl"]
#>  [1] 6 6 4 6 8 6 8 4 4 6 6 8 8 8 8 8 8 4 4 4 4 8 8 8 8 4 4 4 8 6 8 4
length(mtcars[,"cyl"])
#> [1] 32

Other example :

  • base::reshape not working with tibbles

Invariants for subsetting and subassignment explains where the behaviour from tibble differs from data.frame.

These limitations being known, the solution given by Hadley in interacting with legacy code is:

A handful of functions don’t work with tibbles because they expect df[, 1] to return a vector, not a data frame. If you encounter one of these functions, use as.data.frame() to turn a tibble back to a data frame:

like image 81
Waldi Avatar answered Oct 11 '22 16:10

Waldi