Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R checking pairs of rows in a dataframe

Tags:

r

I have a data frame holding information on options like this

> chData
myIdx strike_price       date     exdate cp_flag strike_price    return
1 8355342       605000 1996-04-02 1996-05-18       P       605000  0.002340
2 8355433       605000 1996-04-02 1996-05-18       C       605000  0.002340
3 8356541       605000 1996-04-09 1996-05-18       P       605000 -0.003182
4 8356629       605000 1996-04-09 1996-05-18       C       605000 -0.003182
5 8358033       605000 1996-04-16 1996-05-18       P       605000  0.003907
6 8358119       605000 1996-04-16 1996-05-18       C       605000  0.003907
7 8359391       605000 1996-04-23 1996-05-18       P       605000  0.005695

where cp_flag means that a certain option is either a call or a put. What is a way to make sure that for each date, there is a both a call and a put, and drop the rows for which this does not exist? I can do it with a for loop, but is there a more clever way?

like image 204
stevejb Avatar asked Dec 16 '22 22:12

stevejb


2 Answers

Get the dates that have P's and those that have C's, and use intersect to find the dates that have both.

keep_dates <- with(x, intersect(date[cp_flag=='P'], date[cp_flag=='C']) )
# "1996-04-02" "1996-04-09" "1996-04-16"

Keep only the rows that have dates appearing in keep_dates.

x[ x$date %in% keep_dates, ]
#   myIdx strike_price       date     exdate cp_flag strike_price.1
# 8355342       605000 1996-04-02 1996-05-18       P         605000
# 8355433       605000 1996-04-02 1996-05-18       C         605000
# 8356541       605000 1996-04-09 1996-05-18       P         605000
# 8356629       605000 1996-04-09 1996-05-18       C         605000
# 8358033       605000 1996-04-16 1996-05-18       P         605000
# 8358119       605000 1996-04-16 1996-05-18       C         605000
like image 125
wch Avatar answered Jan 11 '23 15:01

wch


Using the plyr package:

> ddply(chData, "date", function(x) if(all(c("P","C") %in% x$cp_flag)) x)
    myIdx strike_price       date     exdate cp_flag strike_price.1    return
1 8355342       605000 1996-04-02 1996-05-18       P         605000  0.002340
2 8355433       605000 1996-04-02 1996-05-18       C         605000  0.002340
3 8356541       605000 1996-04-09 1996-05-18       P         605000 -0.003182
4 8356629       605000 1996-04-09 1996-05-18       C         605000 -0.003182
5 8358033       605000 1996-04-16 1996-05-18       P         605000  0.003907
6 8358119       605000 1996-04-16 1996-05-18       C         605000  0.003907
like image 31
Joshua Ulrich Avatar answered Jan 11 '23 14:01

Joshua Ulrich