Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Trying to understand R structure: what does a dot in function names signify?

I am trying to learn how to use R. I can use it to do basic things like reading in data and running a t-test. However, I am struggling to understand the way R is structured (I am have a very mediocre java background).

What I don't understand is the way the functions are classified.

For example in is.na(someVector), is is a class? Or for read.csv, is csv a method of the read class?

I need an easier way to learn the functions than simply memorizing them randomly. I like the idea of things belonging to other things. To me it seems like this gives a language a tree structure which makes learning more efficient.

Thank you

Sorry if this is an obvious question I am genuinely confused and have been reading/watching quite a few tutorials.

like image 559
NicoFish Avatar asked Feb 09 '15 04:02

NicoFish


1 Answers

Your confusion is entirely understandable, since R mixes two conventions of using (1) . as a general-purpose word separator (as in is.na(), which.min(), update.formula(), data.frame() ...) and (2) . as an indicator of an S3 method, method.class (i.e. foo.bar() would be the "foo" method for objects with class attribute "bar"). This makes functions like summary.data.frame() (i.e., the summary method for objects with class data.frame) especially confusing.

As @thelatemail points out above, there are some other sets of functions that repeat the same prefix for a variety of different options (as in read.table(), read.delim(), read.fwf() ...), but these are entirely conventional, not specified anywhere in the formal language definition.

dotfuns <- apropos("[a-z]\\.[a-z]")
dotstart <- gsub("\\.[a-zA-Z]+","",dotfuns)
head(dotstart)
tt <- table(dotstart)
head(rev(sort(tt)),10)
##  as   is  print  Sys  file summary dev format all sys 
## 118   51     32   18    17      16  16     15  14  13 

(Some of these are actually S3 generics, some are not. For example, Sys.*(), dev.*(), and file.*() are not.)

Historically _ was used as a shortcut for the assignment operator <- (before = was available as a synonym), so it wasn't available as a word separator. I don't know offhand why camelCase wasn't adopted instead.

Confusingly, methods("is") returns is.na() among many others, but it is effectively just searching for functions whose names start with "is."; it warns that "function 'is' appears not to be generic"

Rasmus Bååth's presentation on naming conventions is informative and entertaining (if a little bit depressing).

extra credit: are there any dot-separated S3 method names, i.e. cases where a function name of the form x.y.z represents the x.y method for objects with class attribute z ?

answer (from Hadley Wickham in comments): as.data.frame.data.frame() wins. as.data.frame is an S3 generic (unlike, say, as.numeric), and as.data.frame.data.frame is its method for data.frame objects. Its purpose (from ?as.data.frame):

If a data frame is supplied, all classes preceding ‘"data.frame"’ are stripped, and the row names are changed if that argument is supplied.

like image 82
Ben Bolker Avatar answered Oct 15 '22 10:10

Ben Bolker