I've discovered some strange behaviour when sub-setting with dplyr tbl_df data frames. When I subset a data-frame with the 'matrix' style df[,'a']
it returns a vector as expected. However when I do the same thing when it's a tbl_df
data frame, it returns a data frame instead.
I've replicated it below using the Iris data set.
Can some-one explain why this is happening, or how I can de-tbl_df that data frames? I need to use dplyr and readr in the build-up to needing this behaviour.
library(dplyr)
data(iris)
str(iris['Sepal.Length'])
'data.frame': 150 obs. of 1 variable:
$ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
str(iris[,'Sepal.Length'])
num [1:150] 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
iris <- tbl_df(iris)
str(iris[,'Sepal.Length'])
Classes ‘tbl_df’ and 'data.frame': 150 obs. of 1 variable:
$ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
tbl_df object is a data frame providing a nicer printing method, useful when working with large data sets. In this article, we'll present the tibble R package, developed by Hadley Wickham. The tibble R package provides easy to use functions for creating tibbles, which is a modern rethinking of data frames.
Subsetting in R is a useful indexing feature for accessing object elements. It can be used to select and filter variables and observations. You can use brackets to select rows and columns from your dataframe.
Q4: Why does mtcars[1:20] return an error? How does it differ from the similar mtcars[1:20, ] ? A: When subsetting a data frame with a single vector, it behaves the same way as subsetting a list of columns. So, mtcars[1:20] would return a data frame containing the first 20 columns of the dataset.
There are three subsetting operators, [[ , [ , and $ . Subsetting operators interact differently with different vector types (e.g., atomic vectors, lists, factors, matrices, and data frames). Subsetting can be combined with assignment.
This is on purpose.
See ?tbl_df
:
Methods:
‘tbl_df’
implements two important base methods:
‘[’
Never simplifies (drops), so always returns data.frame
(emphasis added)
If you class(tbl_df(iris))
you will see that its class is "tbl_df", then "tbl", and finally "data.frame", so it might have a different [
method, and methods(class='tbl_df')
indeed shows [.tbl_df
.
(it's a bit like how datatables in the data.table
package have a different [
method too).
edit : to un-tbl_df
, just use data.frame
, e.g. data.frame(tbl_df(iris))
will convert the tbl_df(..)
back to data.frame.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With