Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to remove row based on condition of row above or below

Tags:

r

I have a dataframe as follows:

     chr   leftPos        ZScore1    ZScore2    ZScore3    ZScore4
      1     24352           34         43          19         43
      1     53534           2          1           -1         -9
      2      34            -15         7           -9         -18
      3     3443           -100        -4          4          -9
      3     3445           -100        -1          6          -1
      3     3667            5          -5          9           5
      3      7882          -8          -9          1           3

I would like to only keep those rows which have the same chr and have adjacent columns with the ZScore going in the same direction. In other words a row should remain if the row before or after for that chr has the same sign (positive or negative). I would like this to run for all columns with ZS in the column name so that the output ends up just being a number of rows that fulfil the criteria for each row.

For one column the code should result in:

     chr   leftPos         ZScore
      1     24352           34
      1     53534           2
      3     3443           -100
      3     3445           -100

but the final output should look like

         ZScore1    ZScore2    ZScore3    ZScore4
nrow        4         6          4          4


 I have tried bits of code but Im not even really sure how to approach this.

I guess I would group by chr and then see if the row above was the same positive or negative as the current row and then see if the row below was the same direction as the current row. Then move to the next row for that chr.

like image 751
Sebastian Zeki Avatar asked Apr 13 '15 17:04

Sebastian Zeki


2 Answers

Try this with the package dplyr

library(dplyr)

Data

df <- data.frame(chr=c(1, 1, 2, 3, 3, 3, 3),
             leftPos=c(24352, 53534, 34, 3443, 3445, 3667, 7882),
             ZScore=c(34, 2, -15, -100, -100, 5, -8))

Code

df %>% group_by(chr) %>% 
   filter(sign(ZScore)==sign(lag(ZScore)) | sign(ZScore)==sign(lead(ZScore))) %>% 
   ungroup
like image 140
dimitris_ps Avatar answered Nov 15 '22 05:11

dimitris_ps


An option using the devel version of data.table (similar to the approach in @dimitris_ps post). Instructions to install the devel version are here

library(data.table)#v1.9.5
na.omit(setDT(df)[, {tmp= sign(ZScore)
  .SD[tmp==shift(tmp) | tmp==shift(tmp, type='lead')] },
             by=chr])
#     chr leftPos ZScore
#1:   1   24352     34
#2:   1   53534      2
#3:   3    3443   -100
#4:   3    3445   -100

Update

We could create a function

 f1 <- function(dat, ZCol){
    na.omit(as.data.table(dat)[, {tmp = sign(eval(as.name(ZCol)))
     .SD[tmp==shift(tmp) | tmp==shift(tmp, type='lead')]},
    by=chr])[, list(.N)]}

 nm1 <- paste0('ZScore', 1:4)
 setnames(do.call(cbind,lapply(nm1, function(x) f1(df1, x))), nm1)[]
 #   ZScore1 ZScore2 ZScore3 ZScore4
 #1:       4       6       4       4

Or we can use set

 res <- as.data.table(matrix(0, ncol=4, nrow=1, 
                  dimnames=list(NULL, nm1)))
 for(j in seq_along(nm1)){
   set(res, i=NULL, j=j, value=f1(df1,nm1[j]))
  }
 res
 #   ZScore1 ZScore2 ZScore3 ZScore4
 #1:       4       6       4       4

data

df <- structure(list(chr = c(1L, 1L, 2L, 3L, 3L, 3L, 3L),
leftPos = c(24352L, 
53534L, 34L, 3443L, 3445L, 3667L, 7882L), ZScore = c(34L, 2L, 
-15L, -100L, -100L, 5L, -8L)), .Names = c("chr", "leftPos", "ZScore"
), class = "data.frame", row.names = c(NA, -7L))

 df1 <- structure(list(chr = c(1L, 1L, 2L, 3L, 3L, 3L, 3L),
 leftPos = c(24352L, 
 53534L, 34L, 3443L, 3445L, 3667L, 7882L), ZScore1 = c(34L, 2L, 
 -15L, -100L, -100L, 5L, -8L), ZScore2 = c(43L, 1L, 7L, -4L, -1L, 
 -5L, -9L), ZScore3 = c(19L, -1L, -9L, 4L, 6L, 9L, 1L),
 ZScore4 = c(43L, 
 -9L, -18L, -9L, -1L, 5L, 3L)), .Names = c("chr", "leftPos",
  "ZScore1", "ZScore2", "ZScore3", "ZScore4"), class = "data.frame",
  row.names = c(NA, -7L))
like image 35
akrun Avatar answered Nov 15 '22 06:11

akrun