Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Class() returns multiple multiple class names in R

Tags:

object

class

r

All, I am a beginner in R. I am not too familiar with how classes are organized in R. I have noticed that some class() calls return one class-type, while others return multiple class names.

Example 1

{My object name is "sassign"} Here's my data:

 acctnum gender state   zip zip3 first last book_ nonbook_ total_ purch child youth cook do_it refernce art geog buyer
1   10001      M    NY 10605  106    49   29   109      248    357    10     3     2    2     0        1   0    2    no
2   10002      M    NY 10960  109    39   27    35      103    138     3     0     1    0     1        0   0    1    no
3   10003      F    PA 19146  191    19   15    25      147    172     2     0     0    2     0        0   0    0    no
4   10004      F    NJ 07016  070     7    7    15      257    272     1     0     0    0     0        1   0    0    no
5   10005      F    NY 10804  108    15   15    15      134    149     1     0     0    1     0        0   0    0    no
6   10006      F    NY 11366  113     7    7    15       98    113     1     0     1    0     0        0   0    0   yes

Now, if I do class(object) above, I get:

class(sassign)
[1] "data.frame"

I am good with this. I understand that this data structure is of type data frame.

Example 2 Now, I recently came across Wickham's tibbleR package. Here's how I converted data frame to Tibble:

tib_sassign<-as_data_frame(sassign)
class(tib_sassign)
[1] "tbl_df"     "tbl"        "data.frame"

This is where I was lost. I do not know the differences between tbl_df and tbl. However, my hypothesis is that Tibble package makes our life easier by returning objects (similar to abstract classes) that can be used as a tibble ("tbl"), data frame ("data.frame") or tbl_df (I have no clue what tbl_df means). I read through dplyr package's online pdf, but I don't think they have explained this. I believe they assume that people know what above would mean.

I read RStudio's blog on https://blog.rstudio.org/2016/03/24/tibble-1-0-0/ but I dont think they have described what the above output means. I also read Norman Matloff's book, but I don't think this is covered. I also googled "tbl_df" "tbl" "data.frame", but most of the results were pertaining to some piece of code not working. I couldn't find an explanation of what above output means.

Example 3 I have now started to look at Time Series in R. This is where I got to a point that I have to start this thread. Here's what I did:

t_sassign <-data.frame(group_by(sassign,last))
t_sassign<-ts(t_sassign,start = c(2014,1),frequency = 12)
class(t_sassign)
[1] "mts"    "ts"     "matrix"

Here, "last" is the # of months. While I do believe I will somehow manage what I need to do, but I still don't get what the above result means.

I also searched through StackOverflow, but most of the results talk about returning Class in JAVA.

I have three questions:

Question 1) It will be awesome if someone could provide an example so that I can understand the output from class()

Question 2) I'd also appreciate if someone could provide a snippet with an application of concept discussed in question 1. This way, I can register this concept in my brain forever.

Question 3) If you know a book that goes into such concepts, I'd appreciate it. I am following R in Action by Kabackoff, R by Norman Matloff and StackOverflow.

Many thanks in advance for your help.


(Added) Here's another confusing thing: When I did:

AP<-AirPassengers
class(AP)
[1] "ts"

I got "ts" as class type. Inherited classes were not shown. I am really lost. Please help me!

like image 407
watchtower Avatar asked Aug 05 '16 17:08

watchtower


1 Answers

This isn't something from base R but rather a feature of what is often referred to as the 'hadleyverse'. Hadley has designed the dplyr package to work with a special version of dataframes. See: http://www.rdocumentation.org/packages/tibble/versions/1.1/topics/tibble-package for a description of the tbl_df class. That class has versions of print, "[", and "[[" that differ from those functions from base-R that would normally handle dataframes as described there. Different printing format and rules, $ and [[ never do partial name matching, and subsetting always returns a data.frame.

Re: a separate description for the tbl-class. What I have found so far suggests to me that dplyr-package docs are the place to look, since it has as.tbl and descriptions of difference methods for different kinds of data-sources such as SQL servers.

A correction. That package is NOT named tibbleR

For you last question (noting that multipart questions are frowned on in SO) You can see that ?inherits will sometimes but not always tell you if an objects= is a member of an "implicit" class and that you may need to use an is- function to test for 'numeric':

> AP<-AirPassengers
> class(AP)
[1] "ts"
> inherits(AP, "matrix")
[1] FALSE
> inherits(AP, "numeric")
[1] FALSE
> str(AP)
 Time-Series [1:144] from 1949 to 1961: 112 118 132 129 121 135 148 148 136 119 ...
> inherits( as.matrix(AP), "numeric")
[1] FALSE
> inherits( as.matrix(AP), "matrix")
[1] TRUE
> str( as.matrix(AP) )
 num [1:144, 1] 112 118 132 129 121 135 148 148 136 119 ...
> inherits( as.matrix(AP), "integer")
[1] FALSE
> is.numeric( as.matrix(AP) )
[1] TRUE
> ?inherits
like image 194
IRTFM Avatar answered Oct 12 '22 23:10

IRTFM