Fans of the Tidyverse regularly give several advantages of using tibbles rather than data frames. Most of them seem designed to protect the user from making mistakes. For example, unlike data frames, tibbles: <ul> <li>Don't need a <code>,drop=FALSE</code> argument to not drop dimensions from your data.</li> <li>Will not let the <code>$</code> operator do partial matching for column names.</li> <li>Only recycle your input vectors if they are of exactly length one.</li> </ul> I'm steadily becoming convinced to replace all of my data frames with tibbles. What are the primary disadvantages of doing so? More specifically, what can a data frame do that a tibble cannot? Preemptively, I would like to make it clear that I am not asking about <code>data.table</code> or any big-picture objections to the Tidyverse. I am strictly asking about tibbles and data frames.

From the trouble with tibbles, you can read : <blockquote> there isn’t really any trouble with tibbles </blockquote> However, <blockquote> Some older packages don’t work with tibbles because of their alternative subsetting method. They expect tib[,1] to return a vector, when in fact it will now return another tibble. </blockquote> This is what @Henrik pointed out in comments. As an example, the <code>length</code> function won't return the same result: <pre class="prettyprint lang-r prettyprint-override"><code>library(tibble) tibblecars <- as_tibble(mtcars) tibblecars[,"cyl"] #> # A tibble: 32 x 1 #> cyl #> <dbl> #> 1 6 #> 2 6 #> 3 4 #> 4 6 #> 5 8 #> 6 6 #> 7 8 #> 8 4 #> 9 4 #> 10 6 #> # ... with 22 more rows length(tibblecars[,"cyl"]) #> [1] 1 mtcars[,"cyl"] #> [1] 6 6 4 6 8 6 8 4 4 6 6 8 8 8 8 8 8 4 4 4 4 8 8 8 8 4 4 4 8 6 8 4 length(mtcars[,"cyl"]) #> [1] 32 </code></pre> Other example : <ul> <li><code>base::reshape</code> not working with tibbles</li> </ul> Invariants for subsetting and subassignment explains where the behaviour from <code>tibble</code> differs from <code>data.frame</code>. These limitations being known, the solution given by Hadley in interacting with legacy code is: <blockquote> A handful of functions don’t work with tibbles because they expect df[, 1] to return a vector, not a data frame. If you encounter one of these functions, use as.data.frame() to turn a tibble back to a data frame: </blockquote>

What can a data frame do that a tibble cannot?

Tags:

dataframe

r

tibble

Fans of the Tidyverse regularly give several advantages of using tibbles rather than data frames. Most of them seem designed to protect the user from making mistakes. For example, unlike data frames, tibbles:

Don't need a ,drop=FALSE argument to not drop dimensions from your data.
Will not let the $ operator do partial matching for column names.
Only recycle your input vectors if they are of exactly length one.

I'm steadily becoming convinced to replace all of my data frames with tibbles. What are the primary disadvantages of doing so? More specifically, what can a data frame do that a tibble cannot?

Preemptively, I would like to make it clear that I am not asking about data.table or any big-picture objections to the Tidyverse. I am strictly asking about tibbles and data frames.

288

asked Mar 03 '21 23:03

J. Mini

1 Answers

From the trouble with tibbles, you can read :

there isn’t really any trouble with tibbles

However,

Some older packages don’t work with tibbles because of their alternative subsetting method. They expect tib[,1] to return a vector, when in fact it will now return another tibble.

This is what @Henrik pointed out in comments.

As an example, the length function won't return the same result:

library(tibble)
tibblecars <- as_tibble(mtcars)
tibblecars[,"cyl"]
#> # A tibble: 32 x 1
#>      cyl
#>    <dbl>
#>  1     6
#>  2     6
#>  3     4
#>  4     6
#>  5     8
#>  6     6
#>  7     8
#>  8     4
#>  9     4
#> 10     6
#> # ... with 22 more rows
length(tibblecars[,"cyl"])
#> [1] 1
mtcars[,"cyl"]
#>  [1] 6 6 4 6 8 6 8 4 4 6 6 8 8 8 8 8 8 4 4 4 4 8 8 8 8 4 4 4 8 6 8 4
length(mtcars[,"cyl"])
#> [1] 32

Other example :

base::reshape not working with tibbles

Invariants for subsetting and subassignment explains where the behaviour from tibble differs from data.frame.

These limitations being known, the solution given by Hadley in interacting with legacy code is:

A handful of functions don’t work with tibbles because they expect df[, 1] to return a vector, not a data frame. If you encounter one of these functions, use as.data.frame() to turn a tibble back to a data frame:

answered Oct 11 '22 16:10

Waldi

Related questions
                            
                                ggplotly Error in order: argument 1 is not a vector
                            
                                How to stop tidyr spread sorting columns alphabetically
                            
                                Shiny local deployment error : input string 1 is invalid UTF-8
                            
                                how to convert table() to matrix in r
                            
                                OLS with both panel-corrected standard errors and AR(1) correction in R
                            
                                How can I maintain consistent box width in a boxplot where factor*group combination has no observations?
                            
                                Setting row names on a tibble is deprecated. Error: invalid 'row.names' length
                            
                                R package: writing internal data, but not all at once
                            
                                How to configure the curl package in R with default web proxy settings?
                            
                                Compiled R code is actually slower than pure R with JIT enabled
                            
                                How to compute the Topological Overlap Measure [TOM] for a weighted adjacency matrix in Python?
                            
                                floating TOC for prettydoc in Rmarkdown ask for theme
                            
                                Create a questionnaire with R Shiny
                            
                                How to profile the loading of an R package
                            
                                sf: How to get back to MULTIPOLYGON from GEOMETRYCOLLECTION?
                            
                                How to merge two lists based on object indices - keeping attributes?
                            
                                How to run for loop in debug mode within RStudio?
                            
                                How to avoid the connection lines in geom_line or geom_path when there is no data?
                            
                                How can I add a logo to a ggplot visualisation?
                            
                                Do we talk about reference type and primitive type in R?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With