Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Identify consecutive sequences based on a given variable

Tags:

dataframe

r

I am literally stuck on this. The df1 has the following variables:

  1. serial = Group of people

  2. id1 = the person from the group (eg. 12 (serial) 1 (id1) =group 12 person 1; 12 2 = group 12 person 2, etc. )

  3. 'Day'when the first (or start) recording was made.

The days consist of equal number of observations (eg.95)

        day1 (Monday)  =  day11-day196 
        day2 (Tuesday) = day21-day296     
        day3 (Wednesday) =  day31-day396   
        day4 (Thursday) =  day41-day496   
        day5 (Friday) = day51-day596      
        day6 (Saturday) = day61-day696   
        day7 (Sunday) =  day71-day796  

Example of df1

serial id1  Day     day1 day2 day3 day4 day5 day6 day7
12      1   Monday    2    1    2    1    1    3    1
123     1   Tuesday   0    3    0    3    3    0    3
10      1   Wednesday 0    3    3    3    3    3    3

I would like to identify the consecutive records (there is no gap between the daily records) and the total amount of the records.

The starting day for consecutive recordings is the 'Day` variable. For example a consecutive record would be serial 12. Recording started on Monday and there are records (at leas one from 95 variable) during the week. During the week (7 x 95 variable) there were made 11 records

A non-consecutive record would be id 123 as the there is a gap day on day3 and day6. Record started on Tuesday and there is a gap on Wednesday and Saturday.

Finally I would like to record the duration of the consecutive recording.

Sample output:

 serial  id1   Duration Occurance        Days
12       1      11        7        day1 day2 day3 day4 day5 day6 day7
123      1      12        0        0
10       1      18        5        day3 day4 day5 day6 day7

Sample data

structure(list(serial = c(12, 123, 10), id1 = c(1, 1, 1), Day = structure(1:3, .Label = c("Monday",
"Tuesday", "Wednesday"), class = "factor"), day1 = c(2, 0, 0),
day2 = c(1, 3, 3), day3 = c(2, 0, 3), day4 = c(1, 3, 3),
day5 = c(1, 3, 3), day6 = c(3, 0, 3), day7 = c(1, 3, 3)), row.names = c(NA,
3L), class = "data.frame")

Similar post R - identify consecutive sequences

like image 315
Rstudent Avatar asked Apr 13 '20 12:04

Rstudent


Video Answer


1 Answers

We can use rleid from data.table to get the 'Occurance' correct

library(data.table)
wkdays <- c("Monday", "Tuesday", "Wednesday", "Thursday", 
"Friday", "Saturday", "Sunday")

out1 <-  do.call(rbind, Map(function(x, y) {
              i1 <- match(y, wkdays): length(x)
              i2 <- x[i1] != 0
              i3 <- all(i2)
              grp1 <- rleid(i2)
              Days <- if(i3) tapply(names(x)[i1][i2], grp1[i2], FUN = paste, collapse= ' ') else ''
             Occurance <- if(i3) length(grp1[i2]) else 0
             data.frame(Occurance, Days)
            }, asplit(df[-(1:3)], 1), df$Day))

 out1$Duration <- rowSums(df1[startsWith(names(df1), 'day')])
 out1
 # Occurance                               Days Duration
 #1         7 day1 day2 day3 day4 day5 day6 day7       11
 #2         0                                          12
 #3         5           day3 day4 day5 day6 day7       18
like image 74
akrun Avatar answered Sep 25 '22 19:09

akrun