I'm looking for a way to code a column based information in the previous N rows to a given row. The dataset is sorted.
In short, I want to create a column called oneweeksince
that returns TRUE
if the victims
column is greater than 0 (or !NA) for seven rows after.
Put another way, if, for row[i]
, row[i]$victims > 0
within any row from row[i - 7]
to row[i]
, then row[i]$oneweeksince
should be TRUE
. The oneweeksince
value should also be TRUE
on rows in which victims > 0
or !is.na(victims)
How can I automate the creation of this column/feature? It is also possible to use the date column to compute the date distance. I'm trying to avoid creating a loop because of the slow perf in R.
The dataset should look like this:
date oneweeksince victims
1 2009-01-01 FALSE NA
2 2009-01-02 FALSE NA
3 2009-01-03 FALSE NA
4 2009-01-04 FALSE NA
5 2009-01-05 FALSE NA
6 2009-01-06 FALSE NA
7 2009-01-07 FALSE NA
8 2009-01-08 TRUE 1
9 2009-01-09 TRUE NA
10 2009-01-10 TRUE NA
11 2009-01-11 TRUE NA
12 2009-01-12 TRUE NA
13 2009-01-13 TRUE NA
14 2009-01-14 TRUE NA
15 2009-01-15 TRUE NA
16 2009-01-16 FALSE NA
17 2009-01-17 FALSE NA
18 2009-01-18 FALSE NA
19 2009-01-19 FALSE NA
20 2009-01-20 FALSE NA
The dataset is many years long so I need an efficient way to do it.
After some discussions, this is the most effective and efficient answer.
library(dplyr)
library(zoo)
dat2 <- dat %>%
mutate(roll = rollapplyr(victims > 0, 8, any, na.rm = TRUE, fill = NA, partial = TRUE)) %>%
mutate(oneweeksince = roll > 0) %>%
select(-roll)
A solution using rollapplyr
from the zoo
package. rollapplyr
can apply a function with a rolling window. In this case, we can specify the rolling window to be 8 and apply the mean
function. Notice that rollmean
function is not suitable in this case becuas we cannot specify na.rm = TRUE
in the rollmean
function. The last step is to simply evaluate if the roll
column is larger than 1.
library(dplyr)
library(zoo)
dat2 <- dat %>%
mutate(roll = rollapplyr(victims, width = 8, FUN = function(x) mean(x, na.rm = TRUE), fill = NA)) %>%
mutate(oneweeksince = roll > 0) %>%
select(-roll)
# dat2
# date victims oneweeksince
# 1 2009-01-01 NA NA
# 2 2009-01-02 NA NA
# 3 2009-01-03 NA NA
# 4 2009-01-04 NA NA
# 5 2009-01-05 NA NA
# 6 2009-01-06 NA NA
# 7 2009-01-07 NA NA
# 8 2009-01-08 1 TRUE
# 9 2009-01-09 NA TRUE
# 10 2009-01-10 NA TRUE
# 11 2009-01-11 NA TRUE
# 12 2009-01-12 NA TRUE
# 13 2009-01-13 NA TRUE
# 14 2009-01-14 NA TRUE
# 15 2009-01-15 NA TRUE
# 16 2009-01-16 NA NA
# 17 2009-01-17 NA NA
# 18 2009-01-18 NA NA
# 19 2009-01-19 NA NA
DATA
dat <- read.table(text = " date oneweeksince victims
1 '2009-01-01' FALSE NA
2 '2009-01-02' FALSE NA
3 '2009-01-03' FALSE NA
4 '2009-01-04' FALSE NA
5 '2009-01-05' FALSE NA
6 '2009-01-06' FALSE NA
7 '2009-01-07' FALSE NA
8 '2009-01-08' TRUE 1
9 '2009-01-09' TRUE NA
10 '2009-01-10' TRUE NA
11 '2009-01-11' TRUE NA
12 '2009-01-12' TRUE NA
13 '2009-01-13' TRUE NA
14 '2009-01-14' TRUE NA
15 '2009-01-15' TRUE NA
16 '2009-01-16' FALSE NA
17 '2009-01-17' FALSE NA
18 '2009-01-18' FALSE NA
19 '2009-01-19' FALSE NA
20 '2009-01-20' FALSE NA",
header = TRUE, stringsAsFactors = FALSE)
dat$oneweeksince <- NULL
The OP pointed out that my solution will not work if there are entries in the first N rows where N is the window width. Here I provided a solution to address that. I am going to use the same example data frame except that I change the second row of victims
to be 1
. The new solution needs functions from purrr
and tidyr
, so I load the tidyverse
package for this.
library(tidyverse)
library(zoo)
dat2 <- dat %>%
mutate(roll = rollapplyr(victims, width = 8, FUN = function(x) mean(x, na.rm = TRUE), fill = NA)) %>%
# Split the data frame for the first width - 1 rows and others
mutate(GroupID = ifelse(row_number() <= 7, 1L, 2L)) %>%
split(.$GroupID) %>%
# Check if the GroupID is 1. If yes, change the roll column to be the same as victims
# After that, use fill to fill in NA
map_if(function(x) unique(x$GroupID) == 1L,
~.x %>% mutate(roll = victims) %>% fill(roll)) %>%
# Combine data frames
bind_rows() %>%
mutate(oneweeksince = roll > 0) %>%
select(-roll)
# dat2
# date victims GroupID oneweeksince
# 1 2009-01-01 NA 1 NA
# 2 2009-01-02 1 1 TRUE
# 3 2009-01-03 NA 1 TRUE
# 4 2009-01-04 NA 1 TRUE
# 5 2009-01-05 NA 1 TRUE
# 6 2009-01-06 NA 1 TRUE
# 7 2009-01-07 NA 1 TRUE
# 8 2009-01-08 1 2 TRUE
# 9 2009-01-09 NA 2 TRUE
# 10 2009-01-10 NA 2 TRUE
# 11 2009-01-11 NA 2 TRUE
# 12 2009-01-12 NA 2 TRUE
# 13 2009-01-13 NA 2 TRUE
# 14 2009-01-14 NA 2 TRUE
# 15 2009-01-15 NA 2 TRUE
# 16 2009-01-16 NA 2 NA
# 17 2009-01-17 NA 2 NA
# 18 2009-01-18 NA 2 NA
# 19 2009-01-19 NA 2 NA
# 20 2009-01-20 NA 2 NA
DATA
dat <- read.table(text = " date oneweeksince victims
1 '2009-01-01' FALSE NA
2 '2009-01-02' FALSE 1
3 '2009-01-03' FALSE NA
4 '2009-01-04' FALSE NA
5 '2009-01-05' FALSE NA
6 '2009-01-06' FALSE NA
7 '2009-01-07' FALSE NA
8 '2009-01-08' TRUE 1
9 '2009-01-09' TRUE NA
10 '2009-01-10' TRUE NA
11 '2009-01-11' TRUE NA
12 '2009-01-12' TRUE NA
13 '2009-01-13' TRUE NA
14 '2009-01-14' TRUE NA
15 '2009-01-15' TRUE NA
16 '2009-01-16' FALSE NA
17 '2009-01-17' FALSE NA
18 '2009-01-18' FALSE NA
19 '2009-01-19' FALSE NA
20 '2009-01-20' FALSE NA",
header = TRUE, stringsAsFactors = FALSE)
dat$oneweeksince <- NULL
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With