Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does subsetting change with tbl_df in dlpyr?

Tags:

indexing

r

dplyr

I've discovered some strange behaviour when sub-setting with dplyr tbl_df data frames. When I subset a data-frame with the 'matrix' style df[,'a'] it returns a vector as expected. However when I do the same thing when it's a tbl_df data frame, it returns a data frame instead.

I've replicated it below using the Iris data set.

Can some-one explain why this is happening, or how I can de-tbl_df that data frames? I need to use dplyr and readr in the build-up to needing this behaviour.

library(dplyr)
data(iris)

str(iris['Sepal.Length'])
'data.frame':   150 obs. of  1 variable:
 $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...

str(iris[,'Sepal.Length'])
 num [1:150] 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...

iris <- tbl_df(iris)

str(iris[,'Sepal.Length'])
Classes ‘tbl_df’ and 'data.frame':  150 obs. of  1 variable:
 $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
like image 565
Tom McMahon Avatar asked Jul 16 '15 00:07

Tom McMahon


People also ask

What does Tbl_df do in R?

tbl_df object is a data frame providing a nicer printing method, useful when working with large data sets. In this article, we'll present the tibble R package, developed by Hadley Wickham. The tibble R package provides easy to use functions for creating tibbles, which is a modern rethinking of data frames.

What does subsetting mean in R?

Subsetting in R is a useful indexing feature for accessing object elements. It can be used to select and filter variables and observations. You can use brackets to select rows and columns from your dataframe.

Why does the command Mtcars 1/20 return an error How does it differ from the similar command Mtcars 1 20?

Q4: Why does mtcars[1:20] return an error? How does it differ from the similar mtcars[1:20, ] ? A: When subsetting a data frame with a single vector, it behaves the same way as subsetting a list of columns. So, mtcars[1:20] would return a data frame containing the first 20 columns of the dataset.

What are the three subsetting operators in R?

There are three subsetting operators, [[ , [ , and $ . Subsetting operators interact differently with different vector types (e.g., atomic vectors, lists, factors, matrices, and data frames). Subsetting can be combined with assignment.


1 Answers

This is on purpose.

See ?tbl_df:

Methods:

‘tbl_df’ implements two important base methods:

print Only prints the first 10 rows, and the columns that fit on screen

‘[’ Never simplifies (drops), so always returns data.frame

(emphasis added)

If you class(tbl_df(iris)) you will see that its class is "tbl_df", then "tbl", and finally "data.frame", so it might have a different [ method, and methods(class='tbl_df') indeed shows [.tbl_df.

(it's a bit like how datatables in the data.table package have a different [ method too).


edit : to un-tbl_df , just use data.frame, e.g. data.frame(tbl_df(iris)) will convert the tbl_df(..) back to data.frame.

like image 101
mathematical.coffee Avatar answered Oct 10 '22 20:10

mathematical.coffee