Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Removing days before and after matching row

Tags:

r

I am able to remove rows that match between two data frames, df1 and df2, with some code kindly furnished by @Eric Fail:

df1[!(apply(df1[1:2], 1, toString) %in% apply(df2[1:2], 1, toString)), ]

or with a dplyr solution by @steveb

df1 %>% filter( ! ((date == df2$date) & (ticker == df2$ticker)) )

However, I realized I need to remove not only the shared row like this:

df1 <- data.frame(ticker = c("MSFT", "MSFT", "MSFT", "MSFT"), 
date = c("2016-01-01", "2016-01-02", "2016-01-03", "2016-01-04"), stringsAsFactors=F)
df1

  ticker       date
1   MSFT 2016-01-01
2   MSFT 2016-01-02
3   MSFT 2016-01-03
4   MSFT 2016-01-04

df2 <- data.frame(ticker = c("AAPL", "GOOG", "MSFT", "FB"), 
date = c("2016-01-01", "2016-01-01", "2016-01-02", "2016-01-03"), stringsAsFactors=F)
df2

  ticker       date
1   AAPL 2016-01-01
2   GOOG 2016-01-01
3   MSFT 2016-01-02
4     FB 2016-01-03

df3 

  ticker       date
1   MSFT 2016-01-01
2   MSFT 2016-01-03
3   MSFT 2016-01-04

But also the day before, and the day after, the specified row. So my final df would be:

  ticker       date
1   MSFT 2016-01-04

Notice, 3 MSFT 2016-01-02 was the match, so that row needs to be removed, along with the day before and day after, 3 MSFT 2016-01-01 and 3 MSFT 2016-01-03

Example with two matches:

df1 <- data.frame(ticker = c("MSFT", "MSFT", "MSFT", "MSFT"),
                  date = as.Date(c("2016-01-01", "2016-01-02", "2016-01-03", "2016-01-04")),
                  stringsAsFactors=F)
df2 <- data.frame(ticker = c("AAPL", "GOOG", "MSFT", "MSFT"),
                  date = as.Date(c("2016-01-01", "2016-01-01", "2016-01-01","2016-01-02")),
                  stringsAsFactors=F)

Goal output:

ticker       date
4   MSFT 2016-01-04
like image 621
RyGuy Avatar asked Sep 25 '22 14:09

RyGuy


1 Answers

You could convert the strings to dates so you can add and subtract the days

df1 <- data.frame(ticker = c("MSFT", "MSFT", "MSFT", "MSFT"),
                  date = as.Date(c("2016-01-01", "2016-01-02", "2016-01-03", "2016-01-04")),
                  stringsAsFactors=F)
df2 <- data.frame(ticker = c("AAPL", "GOOG", "MSFT", "FB"),
                  date = as.Date(c("2016-01-01", "2016-01-01", "2016-01-02", "2016-01-03")),
                  stringsAsFactors=F)


(m <- df2[(df2$date %in% df1$date) & (df2$ticker %in% df1$ticker), ])
#   ticker       date
# 3   MSFT 2016-01-02

df1[!(df1$date %in% (m$date + c(-1,0,1))), ]

#   ticker       date
# 4   MSFT 2016-01-04

edit - for multiple matches, just apply the function(x) across each date

df1 <- data.frame(ticker = c("MSFT", "MSFT", "MSFT", "MSFT"),
                  date = as.Date(c("2016-01-01", "2016-01-02", "2016-01-03", "2016-01-04")),
                  stringsAsFactors=F)
df2 <- data.frame(ticker = c("AAPL", "GOOG", "MSFT", "MSFT"),
                  date = as.Date(c("2016-01-01", "2016-01-01", "2016-01-01","2016-01-02")),
                  stringsAsFactors=F)

(m <- df2[(df2$date %in% df1$date) & (df2$ticker %in% df1$ticker), ])
#   ticker       date
# 3   MSFT 2016-01-01
# 4   MSFT 2016-01-02

df1[!(df1$date %in% (sapply(m$date, function(x) x + c(-1,0,1)))), ]
#   ticker       date
# 4   MSFT 2016-01-04
like image 125
rawr Avatar answered Nov 15 '22 06:11

rawr