Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Select rows without missing values in R

Tags:

r

I am a new user to R and for loop. I am trying to take sampling from data and check to see if there is a colinear column. I want to record in that iteration that the colinear column exists and record it in the vector (baditr). Also, I would like to print a line indicating that "colinearity is at iteration i". Then I would like the code to jump to the second iteration and continue running. For each iteration, I would like the code to save the sum of the columns in the corresponding row of the matrix.

My problem is that I am getting an NA for the bad iterations. My intent is for bad iterations to not be included in my matrix at all. Here is my code:

a0=rep(1,40)
a=rep(0:1,20)
b=c(rep(1,20),rep(0,20))
c0=c(rep(0,12),rep(1,28))
c1=c(rep(1,5),rep(0,35))
c2=c(rep(1,8),rep(0,32))
c3=c(rep(1,23),rep(0,17))
da=matrix(cbind(a0,a,b,c0,c1,c2,c3),nrow=40,ncol=7)
sing <- function(nrw){
  sm <- matrix(NA,nrow=nrw,ncol=ncol(da))
  baditr <- NULL
  for(i in 1:nrw){
    ind <- sample(1:nrow(da), nrow(da),replace =TRUE)
    smdat <- da[ind,]
    evals <- eigen(crossprod(smdat))$values
    if(any(abs(evals) < 1e-7)){
      baditr <- c(baditr,i)
      cat("singularity occurs at", paste(i),"\n")
      next
    }
  sm[i,] <- apply(smdat,2,sum)
  }
  return(sm)
}
sing(20)

I will get the following output:

singularity occurs at 9 
singularity occurs at 13 
      [,1] [,2] [,3] [,4] [,5] [,6] [,7]
 [1,]   40   23   22   25    5    8   26
 [2,]   40   20   18   30    4    7   22
 [3,]   40   19   24   28    6    7   25
 [4,]   40   19   22   30    6    9   26
 [5,]   40   12   26   26    8   13   30
 [6,]   40   17   16   27    7   10   19
 [7,]   40   20   17   33    3    5   19
 [8,]   40   22   19   28    4    9   23
 [9,]   NA   NA   NA   NA   NA   NA   NA
[10,]   40   21   24   28    3    6   27
[11,]   40   21   16   31    2    4   22
[12,]   40   21   21   26    3    6   23
[13,]   NA   NA   NA   NA   NA   NA   NA
[14,]   40   18   16   29    2    7   22
[15,]   40   24   18   30    6    9   21
[16,]   40   23   18   29    4    8   21
[17,]   40   17   25   25    3    8   29
[18,]   40   22   28   23    9   14   30
[19,]   40   25   23   25    7   11   30
[20,]   40   20   23   27    7   10   26

I would like my matrix to look like this:

singularity occurs at 9 
singularity occurs at 13 
      [,1] [,2] [,3] [,4] [,5] [,6] [,7]
 [1,]   40   23   22   25    5    8   26
 [2,]   40   20   18   30    4    7   22
 [3,]   40   19   24   28    6    7   25
 [4,]   40   19   22   30    6    9   26
 [5,]   40   12   26   26    8   13   30
 [6,]   40   17   16   27    7   10   19
 [7,]   40   20   17   33    3    5   19
 [8,]   40   22   19   28    4    9   23
[10,]   40   21   24   28    3    6   27
[11,]   40   21   16   31    2    4   22
[12,]   40   21   21   26    3    6   23
[14,]   40   18   16   29    2    7   22
[15,]   40   24   18   30    6    9   21
[16,]   40   23   18   29    4    8   21
[17,]   40   17   25   25    3    8   29
[18,]   40   22   28   23    9   14   30
[19,]   40   25   23   25    7   11   30
[20,]   40   20   23   27    7   10   26

As a fail safe, I would also appreciate any information you may have on saving a certain number of iterations to a file (for example, 50 iterations), which I can override once the next number of iterations is produced. Meaning, I save the first 50 iterations to a file and then once the second round of 50 iterations is produced, they override the first round and as a result, my file now has 100 iterations.

Sorry for the long post. But thanks in advance.

like image 882
Falcon-StatGuy Avatar asked Sep 13 '12 01:09

Falcon-StatGuy


1 Answers

Before you return sm, you can filter out the rows with NA values by using complete.cases(). It would look something like sm[complete.cases(sm),]. The function returns a logical vector of TRUE/FALSE values, which forces R to not return those values with FALSE.

Also, it doesn't look like you are doing anything with baditers after defining it.I can comment out all lines referring to baditers and your function seems to work just fine...maybe it's a legacy from an older iteration of your code?

Update

Here's your updated function using complete.cases(). Note I also commented out everything related to baditr to illustrate that it's not doing anything currently in your code.

sing <- function(nrw){
  sm <- matrix(NA,nrow=nrw,ncol=ncol(da))
  #baditr <- NULL
  for(i in 1:nrw){
    ind <- sample(1:nrow(da), nrow(da),replace =TRUE)
    smdat <- da[ind,]
    evals <- eigen(crossprod(smdat))$values
    if(any(abs(evals) < 1e-7)){
      #baditr <- c(baditr,i)
      cat("singularity occurs at", paste(i),"\n")
      next
    }
    sm[i,] <- apply(smdat,2,sum)
  }
  return(sm[complete.cases(sm),])
}

Now let's run the function, I'm wrapping dim() around the function call which will tell us the #rows and #columns of the resulting object:

> dim(sing(20))
singularity occurs at 6 
[1] 19  7

So one singularity and a matrix of 19 rows and 7 columns, am I missing something?

As to your other question about writing things out, are you aware of the append parameter to write.table() and friends? The help page tells us that If TRUE, the output is appended to the file. If FALSE, any existing file of the name is destroyed.

Update 2

Here's an example using append = TRUE in write.table()

#Matrix 1 definition and write to file
x <- matrix(1:9, ncol = 3)
write.table(x, "out.txt", sep = "\t", col.names = TRUE, row.names = FALSE)
#Matrix 2 definition and write to same file with append = TRUE
x2 <- matrix(10:18, ncol = 3)
write.table(x2, "out.txt", sep = "\t", col.names = FALSE, row.names = FALSE, append = TRUE)
#read consolidated data back in to check if it's right
x3 <- read.table("out.txt", header = TRUE)

Results in

  V1 V2 V3
1  1  4  7
2  2  5  8
3  3  6  9
4 10 13 16
5 11 14 17
6 12 15 18
like image 158
Chase Avatar answered Sep 29 '22 07:09

Chase