Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Longest run of changes for each dataframe in a list

Tags:

list

dataframe

r

I have a list of multiple dataframes, each of which comprises a string of dates and, for each date, either +1 to indicate an increase or -1 for a decrease.

Here’s an example

security1 <- data.frame(
    date = seq(from =as.Date('2019-01-01'), to = as.Date('2019-01-10'), by = 'day'),
    direction = c(1, 1, 1, -1, -1, 1, 1, 1, 1, -1))
security2 <- data.frame(
    date = seq(from =as.Date('2019-01-01'), to = as.Date('2019-01-10'), by = 'day'),
    direction = c(1, -1, 1, -1, -1, 1, 1,- 1, 1, -1))
clcn <- list(Sec1 = security1, Sec2 = security2)

For each dataframe, I am trying to find the length of the most recent string of changes and last time the number was bigger than this. It may be that the current streak is just 1 day if the previous day’s move was in the other direction.

I’ve searched for several days for an answer to this and found the following using sequence and rle for a single dataframe at Compute counting variable in dataframe

sequence(rle(as.character(data$list))$lengths)

But I’m struggling to feed that into lapply or map to get it to iterate over the list.

I don’t mind the exact output, but ideally it would include: Dataframe name, current streak, previous streak that’s longer, and date that streak ended. But at the most basic, just getting the sequence number added as a new column on the dataframe would be a huge help, and I can (try to) take it from there.

like image 662
Slurp Avatar asked Nov 06 '22 20:11

Slurp


1 Answers

@akrun has the right idea, but since you said added to the data.frame, perhaps:

library(tidyverse)

clcn %>%
  map(~ mutate(., streak = sequence(rle(direction)$lengths)))

$`Sec1`
         date direction streak
1  2019-01-01         1      1
2  2019-01-02         1      2
3  2019-01-03         1      3
4  2019-01-04        -1      1
5  2019-01-05        -1      2
6  2019-01-06         1      1
7  2019-01-07         1      2
8  2019-01-08         1      3
9  2019-01-09         1      4
10 2019-01-10        -1      1

$Sec2
         date direction streak
1  2019-01-01         1      1
2  2019-01-02        -1      1
3  2019-01-03         1      1
4  2019-01-04        -1      1
5  2019-01-05        -1      2
6  2019-01-06         1      1
7  2019-01-07         1      2
8  2019-01-08        -1      1
9  2019-01-09         1      1
10 2019-01-10        -1      1

From there, you could do more mutate calls / additions, such as:

clcn %>%
  map(
    ~ mutate(
      ., 
      streak = sequence(rle(direction)$lengths), 
      max_streak = streak == max(streak)
    )
  )
like image 142
JasonAizkalns Avatar answered Nov 15 '22 11:11

JasonAizkalns