Hi I want to select a group of values conditioned on a value in a data table.
Specifically I would like to select all columns grouped by date and id for all Positive values where e == 1
id date e logret
7 2011-07-29 1 -0.0272275211
7 2011-07-29 2 0.0034229025
7 2011-07-29 3 0.0042622177
8 2011-07-29 1 0.0035662770
8 2011-07-29 2 -0.0015268474
8 2011-07-29 3 0.0013333333
7 2011-07-30 1 0.0044444444
7 2011-07-30 2 -0.0001111111
7 2011-07-30 3 0.0013333333
here all elements for id 8 and date 2011-07-29 and all elements of id 7 for date 2011-07-30 will be selected since the logret for e == 1 is > 0 where as all elements of id 7 on 2011-07-29 are ignored since the first logret (where e==1) is < 0
Ans:
8 2011-07-29 1 0.0035662770
8 2011-07-29 2 -0.0015268474
8 2011-07-29 3 0.0013333333
7 2011-07-30 1 0.0044444444
7 2011-07-30 2 -0.0001111111
7 2011-07-30 3 0.0013333333
in sql I'd use some kind of subselect to achieve this. I would :
1) Select the id and date where e=1 and logret > 0
2) Select * join on results of subselect
I think data.table can do this as well, but I'm finding it tricky to express it in data.table terms. Specifically, I can replicate step 1, but can't do the join part in step 2.
pos <- DT[e==1][logret > 0]
But can't join the pos values back into my DT
It's not pretty, and it's not in data.table
, but this seems like it would work:
# Recreate your data
df = read.table(header=TRUE, text="id date e logret
7 2011-07-29 1 -0.0272275211
7 2011-07-29 2 0.0034229025
7 2011-07-29 2 0.0042622177
8 2011-07-29 1 0.0035662770
8 2011-07-29 2 -0.0015268474
8 2011-07-29 3 0.0013333333")
df[which(df$id != df$id[which(df$e == 1 & df$logret < 0)]),]
# id date e logret
# 4 8 2011-07-29 1 0.003566277
# 5 8 2011-07-29 2 -0.001526847
# 6 8 2011-07-29 3 0.001333333
#
## Or the equivalent in "positive" terms
#
# df[which(df$id == df$id[which(df$e == 1 & df$logret > 0)]),]
Just off the top of my head (I have not had any experience with the data.table
package; it's on my "to learn" list). Here's a possible solution:
temp = split(df, df$date)
lapply(temp,
function(x)
x[which(x$id == x$id[which(x$e == 1 & x$logret > 0)]),])
# $`2011-07-29`
# id date e logret
# 4 8 2011-07-29 1 0.003566277
# 5 8 2011-07-29 2 -0.001526847
# 6 8 2011-07-29 3 0.001333333
#
# $`2011-07-30`
# id date e logret
# 7 7 2011-07-30 1 0.0044444444
# 8 7 2011-07-30 2 -0.0001111111
# 9 7 2011-07-30 3 0.0013333333
It is also worth trying merge
:
merge(df, df[which(df$e == 1 & df$logret > 0), c(1, 2)])
# id date e logret
# 1 7 2011-07-30 1 0.0044444444
# 2 7 2011-07-30 2 -0.0001111111
# 3 7 2011-07-30 3 0.0013333333
# 4 8 2011-07-29 1 0.0035662770
# 5 8 2011-07-29 2 -0.0015268474
# 6 8 2011-07-29 3 0.0013333333
I have solved it in a round about way :
pos <- DT[e==1][logret > 0, list(id,date)]
ans <- DT[J(pos$id,pos$date)];
would be interested to hear any more elegant 1 line ways to do it in data.table.
EDIT from Matthew :
If key(DT)
is already (id,date)
then a one liner would be :
DT[DT[e==1 & logret>0, list(id,date)]]
and that should be faster, too. If you can rely on id
and date
being the first 2 columns of DT
, then it can be shortened to :
DT[DT[e==1 & logret>0]]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With