I have a 2 data frames, one (df1
) that records the daily occurrence of different activities and another (df2
) that records properties of the occurred activity during the day.
From df1
it is possible to identify the repeated occurrence of an activity as well the duration. When the day starts is specified by the Date
variable.
For example:
id
12 the occurrence starts at day1 and ends at d7. In this case the occurrence is 7 and duration is 11.id
123 the week starts at day 5 and ends at d7; occurred in repeated order because of there are gap days at day 6 and duration is 6 and id 123 (starts at day6 ends at day 7) occurred 2 times consecutively and duration 6. In df1
the variable Date defines the day when the record started. For example id 12 record started at day1 and so on.
I would like to identify if during the consecutive occurrence if there are records on the activity properties in df2
.
For example id 12, occurred 7 times and duration is 12 there is record for Wednesday (day3 in df1
) and this record corresponds to the 3 day of the consecutive occurrence. For id 123 there is no data (eg. no consecutive occurrence) but for id 10 for 6 day occurrence and duration 18 there is a record on the 6th day.
Df1:
id day1 day2 day3 day4 day5 day6 day7 Date
12 2 1 2 1 1 3 1 Mon
123 0 3 0 3 3 0 3 Fri
10 0 3 3 3 3 3 3 Sat
Df2:
id c1 c2 Date
12 3 3 Wednesday
123 3 2 Fri
10 3 1 Sat
Outcome:
id c1 c2 Occurrence Position
12 3 3 7 3
123 0 0 0 0
10 3 1 2 1
Sample data: df1
structure(list(id = c(12L, 123L, 10L), day1 = c(2L, 0L, 3L),
day2 = c(1L, 3L, 3L), day3 = c(2L, 0L, 3L), day4 = c(1L,
3L, 3L), day5 = c(1L, 3L, 3L), day6 = c(3L, 0L, 3L), day7 = c(1L,
3L, 3L), Date = c("Monday", "Friday", "Saturday")), row.names = c(NA,
-3L), class = c("data.table", "data.frame"), .internal.selfref = <pointer: 0x000002a81a571ef0>)
df2:
structure(list(id = c(12, 123, 10), c1 = c(3, 3, 3), c2 = c(3,
2, 1), Date = structure(c(3L, 1L, 2L), .Label = c("Friday", "Saturday",
"Wednesday"), class = "factor")), row.names = c(NA, -3L), class = "data.frame")
A solution with dplyr
(maybe not the shortest one):
# library
library(tidyverse)
# get data
df1 <- structure(list(id = c(12L, 123L, 10L),
day1 = c(2L, 0L, 3L),
day2 = c(1L, 3L, 3L),
day3 = c(2L, 0L, 3L),
day4 = c(1L,3L, 3L),
day5 = c(1L, 3L, 3L),
day6 = c(3L, 0L, 3L),
day7 = c(1L,3L, 3L),
Date = c("Monday", "Friday", "Saturday")),
row.names = c(NA,-3L), class = c("data.table", "data.frame"))
df2 <- structure(list(id = c(12, 123, 10),
c1 = c(3, 3, 3),
c2 = c(3, 2, 1),
Date = structure(c(3L, 1L, 2L), .Label = c("Friday", "Saturday","Wednesday"),
class = "factor")), row.names = c(NA, -3L), class = "data.frame")
# change days to nummeric (will help you later)
df1 %>% mutate(
Date_nr_df1=case_when(
Date=="Monday" ~ 1,
Date=="Tuesday" ~2,
Date=="Wednesday" ~3,
Date=="Thursday" ~4,
Date=="Friday" ~5,
Date=="Saturday" ~6,
Date=="Sunday" ~7)) -> df1
df2 %>% mutate(
Date_nr_df2=case_when(
Date=="Monday" ~ 1,
Date=="Tuesday" ~2,
Date=="Wednesday" ~3,
Date=="Thursday" ~4,
Date=="Friday" ~5,
Date=="Saturday" ~6,
Date=="Sunday" ~7)) -> df2
# combine data by the id column
left_join(df1,df2, by=c("id")) -> df
# adjust data
df %>%
group_by(id) %>% # to make changes per row
mutate(days=paste0(day1,day2,day3,day4,day5,day6,day7)) %>% #pastes the values together
mutate(days_correct=substring(days,Date_nr_df1)) %>% # applies the start day
mutate(Occurrence_seq=str_split(days_correct, fixed("0"))[[1]][1]) %>% # extracts all days before 0
mutate(Occurrence=nchar(Occurrence_seq)) %>% ## counts these days
mutate(Occurrence=case_when(Occurrence==1 ~ 0, TRUE ~ as.numeric(Occurrence))) %>% # sets Occurrence to 0 if there is no consecutive occurrence
mutate(Position=Date_nr_df2-Date_nr_df1+1) %>% ## calculates the position you wanted
mutate(c1=case_when(Occurrence==0 ~0, TRUE ~ c1),
c2=case_when(Occurrence==0 ~0, TRUE ~c1),
Position=case_when(Occurrence==0 ~ 0, TRUE ~ as.numeric(Position))) %>%
ungroup() %>% ungroups the df
select(id,c1,c2,Occurrence,Position) # selects the wanted variables
#> # A tibble: 3 x 5
#> id c1 c2 Occurrence Position
#> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 12 3 3 7 3
#> 2 123 0 0 0 0
#> 3 10 3 3 2 1
Created on 2020-04-10 by the reprex package (v0.2.1)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With