Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why do R objects not print in a function or a "for" loop?

I have an R matrix named ddd. When I enter this, everything works fine:

i <- 1
shapiro.test(ddd[,y])
ad.test(ddd[,y]) 
stem(ddd[,y]) 
print(y) 

The calls to Shapiro Wilk, Anderson Darling, and stem all work, and extract the same column.

If I put this code in a "for" loop, the calls to Shapiro Wilk, and Anderson Darling stop working, while the the stem & leaf call and the print call continue to work.

for (y in 7:10) {
    shapiro.test(ddd[,y])
    ad.test(ddd[,y]) 
    stem(ddd[,y]) 
    print(y)
}

The decimal point is 1 digit(s) to the right of the |

  0 | 0
  0 | 899999
  1 | 0

[1] 7

The same thing happens if I try and write a function. SW & AD do not work. The other calls do.

> D <- function (y) {
+ shapiro.test(ddd[,y])
+ ad.test(ddd[,y]) 
+ stem(ddd[,y]) 
+ print(y)  }

> D(9)

  The decimal point is at the |

   9 | 000
   9 | 
  10 | 00000

[1] 9

Why don't all the calls behave the same way?

like image 801
Sal Leggio Avatar asked Jan 17 '11 17:01

Sal Leggio


People also ask

Why not use for loops R?

For loops can be slow if you are incorrectly growing objects or you have a very fast interior of the loop and the entire thing can be replaced with a vectorized operation. Otherwise you're probably not losing too much efficiency, as the apply family of functions are performing for loops on the inside, too.

Does R have a print function?

In R there are various methods to print the output. Most common method to print output in R program, there is a function called print() is used. Also if the program of R is written over the console line by line then the output is printed normally, no need to use any function for print that output.

Should FOR loops be avoided in R?

A FOR loop is the most intuitive way to apply an operation to a series by looping through each item one by one, which makes perfect sense logically but should be avoided by useRs given the low efficiency.

What can I use instead of a for loop in R?

Instead of using a for loop, it's better to use a functional. Each functional is tailored for a specific task, so when you recognise the functional you know immediately why it's being used.


3 Answers

In a loop, automatic printing is turned off, as it is inside a function. You need to explicitly print something in both cases if you want to see the output. The [1] 9 things you are getting is because you are explicitly printing the values of y.

Here is an example of how you might want to consider going about doing this.

> DF <- data.frame(A = rnorm(100), B = rlnorm(100))
> y <- 1
> shapiro.test(DF[,y])

    Shapiro-Wilk normality test

data:  DF[, y] 
W = 0.9891, p-value = 0.5895

So we have automatic printing. In the loop we would have to do this:

for(y in 1:2) {
    print(shapiro.test(DF[,y]))
}

If you want to print more tests out, then just add them as extra lines in the loop:

for(y in 1:2) {
    writeLines(paste("Shapiro Wilks Test for column", y))
    print(shapiro.test(DF[,y]))
    writeLines(paste("Anderson Darling Test for column", y))
    print(ad.test(DF[,y]))
}

But that isn't very appealing unless you like reading through reams of output. Instead, why not save the fitted test objects and then you can print them and investigate them, maybe even process them to aggregate the test statistics and p-values into a table? You can do that using a loop:

## object of save fitted objects in
obj <- vector(mode = "list", length = 2)
## loop
for(y in seq_along(obj)) {
    obj[[y]] <- shapiro.test(DF[,y])
}

We can then look at the models using

> obj[[1]]

    Shapiro-Wilk normality test

data:  DF[, y] 
W = 0.9891, p-value = 0.5895

for example, or using lapply, which takes care of setting up the object we use to store the results for us:

> obj2 <- lapply(DF, shapiro.test)
> obj2[[1]]

    Shapiro-Wilk normality test

data:  X[[1L]] 
W = 0.9891, p-value = 0.5895

Say now I wanted to extract the W and p-value data, we can process the object storing all the results to extract the bits we want, e.g.:

> tab <- t(sapply(obj2, function(x) c(x$statistic, x$p.value)))
> colnames(tab) <- c("W", "p.value")
> tab
          W      p.value
A 0.9890621 5.894563e-01
B 0.4589731 1.754559e-17

Or for those with a penchant for significance stars:

> tab2 <- lapply(obj2, function(x) c(W = unname(x$statistic), 
+                                    `p.value` = x$p.value))
> tab2 <- data.frame(do.call(rbind, tab2))
> printCoefmat(tab2, has.Pvalue = TRUE)
       W p.value    
A 0.9891  0.5895    
B 0.4590  <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

This has got to be better than firing output to the screen that you then have to pour through?

like image 135
Gavin Simpson Avatar answered Oct 09 '22 02:10

Gavin Simpson


Not a new answer, but in addition to the above: "flush.console()" is necessary to force printing to take place DURING the loop rather than after. Only reason I use print() during a loop is to show progress, e.g., of reading many files.

for (i in 1:10) {
  print(i)
  flush.console()
  for(j in 1:100000)
    k <- 0
}
like image 38
J. Win. Avatar answered Oct 09 '22 02:10

J. Win.


Fantastic answer from Gavin Simpson. I took the last bit of magic and turned it into a function.

sw.df <- function ( data ) { 
   obj <- lapply(data, shapiro.test)
   tab <- lapply(obj, function(x) c(W = unname(x$statistic), `p.value` = x$p.value))
   tab <- data.frame(do.call(rbind, tab))
   printCoefmat(tab, has.Pvalue = TRUE)
}

Then you can just call it with your data frame sw.df ( df )

And if you want to try a transformation: sw.df ( log(df) )

like image 29
Rolando Avatar answered Oct 09 '22 03:10

Rolando