I have a dataframe as follows:
chr leftPos ZScore1 ZScore2 ZScore3 ZScore4
1 24352 34 43 19 43
1 53534 2 1 -1 -9
2 34 -15 7 -9 -18
3 3443 -100 -4 4 -9
3 3445 -100 -1 6 -1
3 3667 5 -5 9 5
3 7882 -8 -9 1 3
I would like to only keep those rows which have the same chr and have adjacent columns with the ZScore going in the same direction. In other words a row should remain if the row before or after for that chr has the same sign (positive or negative). I would like this to run for all columns with ZS in the column name so that the output ends up just being a number of rows that fulfil the criteria for each row.
For one column the code should result in:
chr leftPos ZScore
1 24352 34
1 53534 2
3 3443 -100
3 3445 -100
but the final output should look like
ZScore1 ZScore2 ZScore3 ZScore4
nrow 4 6 4 4
I have tried bits of code but Im not even really sure how to approach this.
I guess I would group by chr and then see if the row above was the same positive or negative as the current row and then see if the row below was the same direction as the current row. Then move to the next row for that chr.
Try this with the package dplyr
library(dplyr)
Data
df <- data.frame(chr=c(1, 1, 2, 3, 3, 3, 3),
leftPos=c(24352, 53534, 34, 3443, 3445, 3667, 7882),
ZScore=c(34, 2, -15, -100, -100, 5, -8))
Code
df %>% group_by(chr) %>%
filter(sign(ZScore)==sign(lag(ZScore)) | sign(ZScore)==sign(lead(ZScore))) %>%
ungroup
An option using the devel version of data.table
(similar to the approach in @dimitris_ps post). Instructions to install the devel version are here
library(data.table)#v1.9.5
na.omit(setDT(df)[, {tmp= sign(ZScore)
.SD[tmp==shift(tmp) | tmp==shift(tmp, type='lead')] },
by=chr])
# chr leftPos ZScore
#1: 1 24352 34
#2: 1 53534 2
#3: 3 3443 -100
#4: 3 3445 -100
We could create a function
f1 <- function(dat, ZCol){
na.omit(as.data.table(dat)[, {tmp = sign(eval(as.name(ZCol)))
.SD[tmp==shift(tmp) | tmp==shift(tmp, type='lead')]},
by=chr])[, list(.N)]}
nm1 <- paste0('ZScore', 1:4)
setnames(do.call(cbind,lapply(nm1, function(x) f1(df1, x))), nm1)[]
# ZScore1 ZScore2 ZScore3 ZScore4
#1: 4 6 4 4
Or we can use set
res <- as.data.table(matrix(0, ncol=4, nrow=1,
dimnames=list(NULL, nm1)))
for(j in seq_along(nm1)){
set(res, i=NULL, j=j, value=f1(df1,nm1[j]))
}
res
# ZScore1 ZScore2 ZScore3 ZScore4
#1: 4 6 4 4
df <- structure(list(chr = c(1L, 1L, 2L, 3L, 3L, 3L, 3L),
leftPos = c(24352L,
53534L, 34L, 3443L, 3445L, 3667L, 7882L), ZScore = c(34L, 2L,
-15L, -100L, -100L, 5L, -8L)), .Names = c("chr", "leftPos", "ZScore"
), class = "data.frame", row.names = c(NA, -7L))
df1 <- structure(list(chr = c(1L, 1L, 2L, 3L, 3L, 3L, 3L),
leftPos = c(24352L,
53534L, 34L, 3443L, 3445L, 3667L, 7882L), ZScore1 = c(34L, 2L,
-15L, -100L, -100L, 5L, -8L), ZScore2 = c(43L, 1L, 7L, -4L, -1L,
-5L, -9L), ZScore3 = c(19L, -1L, -9L, 4L, 6L, 9L, 1L),
ZScore4 = c(43L,
-9L, -18L, -9L, -1L, 5L, 3L)), .Names = c("chr", "leftPos",
"ZScore1", "ZScore2", "ZScore3", "ZScore4"), class = "data.frame",
row.names = c(NA, -7L))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With