I am able to remove rows that match between two data frames, df1 and df2, with some code kindly furnished by @Eric Fail:
df1[!(apply(df1[1:2], 1, toString) %in% apply(df2[1:2], 1, toString)), ]
or with a dplyr
solution by @steveb
df1 %>% filter( ! ((date == df2$date) & (ticker == df2$ticker)) )
However, I realized I need to remove not only the shared row like this:
df1 <- data.frame(ticker = c("MSFT", "MSFT", "MSFT", "MSFT"),
date = c("2016-01-01", "2016-01-02", "2016-01-03", "2016-01-04"), stringsAsFactors=F)
df1
ticker date
1 MSFT 2016-01-01
2 MSFT 2016-01-02
3 MSFT 2016-01-03
4 MSFT 2016-01-04
df2 <- data.frame(ticker = c("AAPL", "GOOG", "MSFT", "FB"),
date = c("2016-01-01", "2016-01-01", "2016-01-02", "2016-01-03"), stringsAsFactors=F)
df2
ticker date
1 AAPL 2016-01-01
2 GOOG 2016-01-01
3 MSFT 2016-01-02
4 FB 2016-01-03
df3
ticker date
1 MSFT 2016-01-01
2 MSFT 2016-01-03
3 MSFT 2016-01-04
But also the day before, and the day after, the specified row. So my final df would be:
ticker date
1 MSFT 2016-01-04
Notice, 3 MSFT 2016-01-02
was the match, so that row needs to be removed, along with the day before and day after, 3 MSFT 2016-01-01
and 3 MSFT 2016-01-03
Example with two matches:
df1 <- data.frame(ticker = c("MSFT", "MSFT", "MSFT", "MSFT"),
date = as.Date(c("2016-01-01", "2016-01-02", "2016-01-03", "2016-01-04")),
stringsAsFactors=F)
df2 <- data.frame(ticker = c("AAPL", "GOOG", "MSFT", "MSFT"),
date = as.Date(c("2016-01-01", "2016-01-01", "2016-01-01","2016-01-02")),
stringsAsFactors=F)
Goal output:
ticker date
4 MSFT 2016-01-04
You could convert the strings to dates so you can add and subtract the days
df1 <- data.frame(ticker = c("MSFT", "MSFT", "MSFT", "MSFT"),
date = as.Date(c("2016-01-01", "2016-01-02", "2016-01-03", "2016-01-04")),
stringsAsFactors=F)
df2 <- data.frame(ticker = c("AAPL", "GOOG", "MSFT", "FB"),
date = as.Date(c("2016-01-01", "2016-01-01", "2016-01-02", "2016-01-03")),
stringsAsFactors=F)
(m <- df2[(df2$date %in% df1$date) & (df2$ticker %in% df1$ticker), ])
# ticker date
# 3 MSFT 2016-01-02
df1[!(df1$date %in% (m$date + c(-1,0,1))), ]
# ticker date
# 4 MSFT 2016-01-04
edit - for multiple matches, just apply the function(x)
across each date
df1 <- data.frame(ticker = c("MSFT", "MSFT", "MSFT", "MSFT"),
date = as.Date(c("2016-01-01", "2016-01-02", "2016-01-03", "2016-01-04")),
stringsAsFactors=F)
df2 <- data.frame(ticker = c("AAPL", "GOOG", "MSFT", "MSFT"),
date = as.Date(c("2016-01-01", "2016-01-01", "2016-01-01","2016-01-02")),
stringsAsFactors=F)
(m <- df2[(df2$date %in% df1$date) & (df2$ticker %in% df1$ticker), ])
# ticker date
# 3 MSFT 2016-01-01
# 4 MSFT 2016-01-02
df1[!(df1$date %in% (sapply(m$date, function(x) x + c(-1,0,1)))), ]
# ticker date
# 4 MSFT 2016-01-04
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With