Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the most useful R trick? [closed]

Tags:

r

str() tells you the structure of any object.


One very useful function I often use is dput(), which allows you to dump an object in the form of R code.

# Use the iris data set
R> data(iris)
# dput of a numeric vector
R> dput(iris$Petal.Length)
c(1.4, 1.4, 1.3, 1.5, 1.4, 1.7, 1.4, 1.5, 1.4, 1.5, 1.5, 1.6, 
1.4, 1.1, 1.2, 1.5, 1.3, 1.4, 1.7, 1.5, 1.7, 1.5, 1, 1.7, 1.9, 
1.6, 1.6, 1.5, 1.4, 1.6, 1.6, 1.5, 1.5, 1.4, 1.5, 1.2, 1.3, 1.4, 
1.3, 1.5, 1.3, 1.3, 1.3, 1.6, 1.9, 1.4, 1.6, 1.4, 1.5, 1.4, 4.7, 
4.5, 4.9, 4, 4.6, 4.5, 4.7, 3.3, 4.6, 3.9, 3.5, 4.2, 4, 4.7, 
3.6, 4.4, 4.5, 4.1, 4.5, 3.9, 4.8, 4, 4.9, 4.7, 4.3, 4.4, 4.8, 
5, 4.5, 3.5, 3.8, 3.7, 3.9, 5.1, 4.5, 4.5, 4.7, 4.4, 4.1, 4, 
4.4, 4.6, 4, 3.3, 4.2, 4.2, 4.2, 4.3, 3, 4.1, 6, 5.1, 5.9, 5.6, 
5.8, 6.6, 4.5, 6.3, 5.8, 6.1, 5.1, 5.3, 5.5, 5, 5.1, 5.3, 5.5, 
6.7, 6.9, 5, 5.7, 4.9, 6.7, 4.9, 5.7, 6, 4.8, 4.9, 5.6, 5.8, 
6.1, 6.4, 5.6, 5.1, 5.6, 6.1, 5.6, 5.5, 4.8, 5.4, 5.6, 5.1, 5.1, 
5.9, 5.7, 5.2, 5, 5.2, 5.4, 5.1)
# dput of a factor levels
R> dput(levels(iris$Species))
c("setosa", "versicolor", "virginica")

It can be very useful to post easily reproducible data chunks when you ask for help, or to edit or reorder the levels of a factor.


head() and tail() to get the first and last parts of a dataframe, vector, matrix, function, etc. Especially with large data frames, this is a quick way to check that it has loaded ok.


One nice feature: Reading data uses connections which can be local files, remote files accessed via http, pipes from other programs or more.

As a simple example, consider this access for N=10 random integers between min=100 and max=200 from random.org (which supplies true random numbers based on atmospheric noise rather than a pseudo random number generator):

R> site <- "http://random.org/integers/"         # base URL
R> query <- "num=10&min=100&max=200&col=2&base=10&format=plain&rnd=new"
R> txt <- paste(site, query, sep="?")            # concat url and query string
R> nums <- read.table(file=txt)                  # and read the data
R> nums                                          # and show it
   V1  V2
1 165 143
2 107 118
3 103 132
4 191 100
5 138 185
R>

As an aside, the random package provides several convenience functions for accessing random.org.


I find I am using with() and within() more and more. No more $ littering my code and one doesn't need to start attaching objects to the search path. More seriously, I find with() etc make the intention of my data analysis scripts much clearer.

> df <- data.frame(A = runif(10), B = rnorm(10))
> A <- 1:10 ## something else hanging around...
> with(df, A + B) ## I know this will use A in df!
 [1]  0.04334784 -0.40444686  1.99368816  0.13871605 -1.17734837
 [6]  0.42473812  2.33014226  1.61690799  1.41901860  0.8699079

with() sets up an environment within which the R expression is evaluated. within() does the same thing but allows you to modify the data object used to create the environment.

> df <- within(df, C <- rpois(10, lambda = 2))
> head(df)
           A          B C
1 0.62635571 -0.5830079 1
2 0.04810539 -0.4525522 1
3 0.39706979  1.5966184 3
4 0.95802501 -0.8193090 2
5 0.76772541 -1.9450738 2
6 0.21335006  0.2113881 4

Something I didn't realise when I first used within() is that you have to do an assignment as part of the expression evaluated and assign the returned object (as above) to get the desired effect.