Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Deleting columns based on the value of a row

Tags:

dataframe

r

row

Given two data frames:

 C1<-c(3,4,4,4,5)
 C2<-c(3,7,3,4,5)
 C3<-c(5,6,3,7,4)
 DF<-data.frame(C1=C1,C2=C2,C3=C3)
 DF
   C1 C2 C3
1  3  3  5
2  4  7  6
3  4  3  3
4  4  4  7
5  5  5  4

and

V1<-c(3,2,2,4,5)
V2<-c(3,7,3,5,2)
V3<-c(5,2,5,7,5)
V4<-c(1,1,2,3,4)
V5<-c(1,2,6,7,5)
DF2<-data.frame(V1=V1,V2=V2,V3=V3,V4=V4,V5=V5)
DF2
  V1 V2 V3 V4 V5
1  3  3  5  1  1
2  2  7  2  1  2
3  2  3  5  2  6
4  4  5  7  3  7
5  5  2  5  4  5

Looking at each equivalent row in both data frames, there is a relationship between the value in C3 and the number of columns I want to drop in that same row in DF2.

The relationship between the value in C3 and the # of columns in DF2 to drop looks like this

If C3≥7 drop V5
If C3=6.0:6.9 drop V4 and up (so basically V5,V4)
If C3=5.0:5.9 drop V3 and up (so basically V5,V4,V3)
If C3=4.0:4.9 drop V2 and up (so basically V5,V4,V3,V2)
If C3≤3.9 drop entire row

For this example, based on the values of C3, I would want DF2 to look like this

  V1 V2 V3 V4 V5
1  3  3
2  2  7  2

4  4  5  7  3 
5  5

I've tried write a simple script to do this (I'm pretty new so I like to keep things simple so I can see what's going on) but I'm throwing errors left and right so I'd appreciate some advice on how to proceed

like image 439
Vinterwoo Avatar asked Jun 15 '12 03:06

Vinterwoo


2 Answers

I like Koshke's answer, but if your rules for setting to NA don't have a nice mathematical property to them or you need to define your rules arbitrarily, this approach should give you that flexibility. First, define a function that returns the columns to drop based on your rules:

f <- function(x) {
  if(x >= 7){
    out <- 5
  }else if(x >= 6.0){
           out <- 4:5
  } else if( x >= 5.0){
    out <- 3:5
  } else if (x >= 4.0){
    out <- 2:5
  } else {
    out <- 1:5
  }
  return(out)
}

Next, create a list for the column indices to drop:

z <- lapply(DF$C3, f)

Finally, loop through each row setting the corresponding columns to NA:

for(j in seq(length(z))){
  DF2[j, z[[j]]] <- NA
}

#-----
  V1 V2 V3 V4 V5
1  3  3 NA NA NA
2  2  7  2 NA NA
3 NA NA NA NA NA
4  4  5  7  3 NA
5  5 NA NA NA NA
like image 164
Chase Avatar answered Nov 03 '22 00:11

Chase


Perhaps the easiest way is like:

DF3 <- DF2
for (i in seq_len(nrow(DF3))) {
  DF3[i, seq_len(ncol(DF3)) >= DF[i, ]$C3 - 2] <- NA
}
DF3

then,

> DF3
  V1 V2 V3 V4 V5
1  3  3 NA NA NA
2  2  7  2 NA NA
3 NA NA NA NA NA
4  4  5  7  3 NA
5  5 NA NA NA NA
like image 44
kohske Avatar answered Nov 03 '22 00:11

kohske