I’m trying to calculate the maximum winning and losing streak in a dataset (i.e. the highest number of consecutive positive or negative values). I’ve found a somewhat related question here on StackOverflow and even though that gave me some good suggestions, the angle of that question is different, and I’m not (yet) experienced enough to translate and apply that information to this problem. So I was hoping you could help me out, even an suggestion would be great. My data set look like this: <pre class="prettyprint"><code>> subRes Instrument TradeResult.Currency. 1 JPM -3 2 JPM 264 3 JPM 284 4 JPM 69 5 JPM 283 6 JPM -219 7 JPM -91 8 JPM 165 9 JPM -35 10 JPM -294 11 KFT -8 12 KFT -48 13 KFT 125 14 KFT -150 15 KFT -206 16 KFT 107 17 KFT 107 18 KFT 56 19 KFT -26 20 KFT 189 > split(subRes[,2],subRes[,1]) $JPM [1] -3 264 284 69 283 -219 -91 165 -35 -294 $KFT [1] -8 -48 125 -150 -206 107 107 56 -26 189 </code></pre> In this case, the maximum (winning) streak for JPM is four (namely the 264, 284, 69 and 283 consecutive positive results) and for KFT this value is 3 (107, 107, 56). My goal is to create a function which gives the maximum winning streaks per instrument (i.e. JPM: 4, KFT: 3). To achieve that: R needs to compare the current result with the previous result, and if it is higher then there is a streak of at least 2 consecutive positive results. Then R needs to look at the next value, and if this is also higher: add 1 to the already found value of 2. If this value isn’t higher, R needs to move on to the next value, while remembering 2 as the intermediate maximum. I’ve tried <code>cumsum</code> and <code>cummax</code> in accordance with conditional summing (like <code>cumsum(c(TRUE, diff(subRes[,2]) > 0))</code>), which didn’t work out. Also <code>rle</code> in accordance with <code>lapply</code> (like <code> lapply(rle(subRes$TradeResult.Currency.), function(x) diff(x) > 0)</code>) didn’t work. How can I make this work? <h3>Edit 19 January 2011</h3> Calculating the size of an streak Besides the length of the streak, I would also like to incorporate the size of the streak in my analysis. With the answers provided below, I thought I was able to do it by myself, sadly I'm mistaken and run into the following problem(s): With the following data frame: <pre class="prettyprint"><code>> subRes Instrument TradeResult.Currency. 1 JPM -3 2 JPM 264 3 JPM 284 4 JPM 69 5 JPM 283 6 JPM -219 7 JPM -91 8 JPM 165 9 JPM -35 10 JPM -294 11 KFT -8 12 KFT -48 13 KFT 125 14 KFT -150 15 KFT -206 16 KFT 107 17 KFT 107 18 KFT 56 19 KFT -26 20 KFT 189 > lapply(split(subRes[,2], subRes[,1]), function(x) { + df.rle <- ifelse(x > 0, 1, 0) + df.rle <- rle(df.rle) + + wh <- which(df.rle$lengths == max(df.rle$lengths)) + mx <- df.rle$lengths[wh] + suma <- df.rle$lengths[1:wh] + out <- x[(sum(suma) - (suma[length(suma)] - 1)):sum(suma)] + return(out) + }) $JPM [1] 264 284 69 283 $KFT [1] 107 107 56 </code></pre> This result is correct, and changing the last line to <code>return(sum(out))</code> I can get the total size of the streak: <pre class="prettyprint"><code>$JPM [1] 900 $KFT [1] 270 </code></pre> However, the function does not seem to count the losing streaks when changing the <code>ifelse</code> condition: <pre class="prettyprint"><code>lapply(split(subRes[,2], subRes[,1]), function(x) { df.rle <- ifelse(x < 0, 1, 0) df.rle <- rle(df.rle) wh <- which(df.rle$lengths == max(df.rle$lengths)) mx <- df.rle$lengths[wh] suma <- df.rle$lengths[1:wh] out <- x[(sum(suma) - (suma[length(suma)] - 1)):sum(suma)] return(out) }) $JPM [1] 264 284 69 283 $KFT [1] 107 107 56 </code></pre> I don’t see what I need to change about this function to ultimately come to the total sum of the losing streak. However I tweak/change the function, I get the same result or an error. The <code>ifelse</code> function confuses me, because it seems the obvious part of the function to change, yet doesn't result in any change. What obvious point am I missing?

This will work: <pre class="prettyprint"><code>FUN <- function(x, negate = FALSE, na.rm = FALSE) { rles <- rle(x > 0) if(negate) { max(rles$lengths[!rles$values], na.rm = na.rm) } else { max(rles$lengths[rles$values], na.rm = na.rm) } } wins <- lapply(split(subRes[,2],subRes[,1]), FUN) loses <- lapply(split(subRes[,2],subRes[,1]), FUN, negate = TRUE) </code></pre> Giving this: <pre class="prettyprint"><code>> wins $JPM [1] 4 $KFT [1] 3 > loses $JPM [1] 2 $KFT [1] 2 </code></pre> or: <pre class="prettyprint"><code>> sapply(split(subRes[,2],subRes[,1]), FUN) JPM KFT 4 3 > sapply(split(subRes[,2],subRes[,1]), FUN, negate = TRUE) JPM KFT 2 2 </code></pre> You were close, but you needed to apply <code>rle()</code> to each element of your list separately, and also convert <code>TradeResult.Currency.</code> to a logical vector depending indicating above 0 or not. Our function <code>FUN</code> returns just the <code>lengths</code> component of the object returned by <code>rle</code>, and we apply <code>max()</code> to this vector of lengths to find the longest winning run. Note that here <code>split</code> isn't necessary, and you can use the other subset-by-factor-and-apply-function functions (<code>tapply</code>, <code>aggregate</code>, etc) here: <pre class="prettyprint"><code>> with(subRes, aggregate(`TradeResult.Currency.`, + by = list(Instrument = Instrument), FUN)) Instrument x 1 JPM 4 2 KFT 3 > with(subRes, tapply(`TradeResult.Currency.`, Instrument, FUN)) JPM KFT 4 3 </code></pre> <hr> The reason the earlier version wasn't right, was because if you had a longer series of losses than wins (longer series of negative values), would result in the length of the losses series being selected. The modified function adds a <code>'negate'</code> argument to swap the meaning of the test. If we want wins, we leave <code>TRUE</code> and <code>FALSE</code> in <code>$values</code> as they are. If we want losses, we swap <code>TRUE</code> and <code>FALSE</code>. We can then use this <code>$values</code> component to select only the runs that correspond to wins (<code>negate = TRUE</code>) or the runs that correspond to losses (<code>negate = FALSE</code>).

Calculating a consecutive streak in data

I’m trying to calculate the maximum winning and losing streak in a dataset (i.e. the highest number of consecutive positive or negative values). I’ve found a somewhat related question here on StackOverflow and even though that gave me some good suggestions, the angle of that question is different, and I’m not (yet) experienced enough to translate and apply that information to this problem. So I was hoping you could help me out, even an suggestion would be great.

My data set look like this:

> subRes
   Instrument TradeResult.Currency.
1         JPM                    -3
2         JPM                   264
3         JPM                   284
4         JPM                    69
5         JPM                   283
6         JPM                  -219
7         JPM                   -91
8         JPM                   165
9         JPM                   -35
10        JPM                  -294
11        KFT                    -8
12        KFT                   -48
13        KFT                   125
14        KFT                  -150
15        KFT                  -206
16        KFT                   107
17        KFT                   107
18        KFT                    56
19        KFT                   -26
20        KFT                   189
> split(subRes[,2],subRes[,1])
$JPM
 [1]   -3  264  284   69  283 -219  -91  165  -35 -294
$KFT
 [1]   -8  -48  125 -150 -206  107  107   56  -26  189

In this case, the maximum (winning) streak for JPM is four (namely the 264, 284, 69 and 283 consecutive positive results) and for KFT this value is 3 (107, 107, 56).

My goal is to create a function which gives the maximum winning streaks per instrument (i.e. JPM: 4, KFT: 3). To achieve that:

R needs to compare the current result with the previous result, and if it is higher then there is a streak of at least 2 consecutive positive results. Then R needs to look at the next value, and if this is also higher: add 1 to the already found value of 2. If this value isn’t higher, R needs to move on to the next value, while remembering 2 as the intermediate maximum.

I’ve tried cumsum and cummax in accordance with conditional summing (like cumsum(c(TRUE, diff(subRes[,2]) > 0))), which didn’t work out. Also rle in accordance with lapply (like lapply(rle(subRes$TradeResult.Currency.), function(x) diff(x) > 0)) didn’t work.

How can I make this work?

Edit 19 January 2011

Calculating the size of an streak Besides the length of the streak, I would also like to incorporate the size of the streak in my analysis. With the answers provided below, I thought I was able to do it by myself, sadly I'm mistaken and run into the following problem(s):

With the following data frame:

> subRes
   Instrument TradeResult.Currency.
1         JPM                    -3
2         JPM                   264
3         JPM                   284
4         JPM                    69
5         JPM                   283
6         JPM                  -219
7         JPM                   -91
8         JPM                   165
9         JPM                   -35
10        JPM                  -294
11        KFT                    -8
12        KFT                   -48
13        KFT                   125
14        KFT                  -150
15        KFT                  -206
16        KFT                   107
17        KFT                   107
18        KFT                    56
19        KFT                   -26
20        KFT                   189
> lapply(split(subRes[,2], subRes[,1]), function(x) {
+             df.rle <- ifelse(x > 0, 1, 0)
+             df.rle <- rle(df.rle)
+ 
+             wh <- which(df.rle$lengths == max(df.rle$lengths))
+             mx <- df.rle$lengths[wh]
+             suma <- df.rle$lengths[1:wh]
+             out <- x[(sum(suma) - (suma[length(suma)] - 1)):sum(suma)]
+             return(out)
+         })
$JPM
[1] 264 284  69 283

$KFT
[1] 107 107  56

This result is correct, and changing the last line to return(sum(out)) I can get the total size of the streak:

$JPM
[1] 900

$KFT
[1] 270

However, the function does not seem to count the losing streaks when changing the ifelse condition:

lapply(split(subRes[,2], subRes[,1]), function(x) {
            df.rle <- ifelse(x < 0, 1, 0)
            df.rle <- rle(df.rle)

            wh <- which(df.rle$lengths == max(df.rle$lengths))
            mx <- df.rle$lengths[wh]
            suma <- df.rle$lengths[1:wh]
            out <- x[(sum(suma) - (suma[length(suma)] - 1)):sum(suma)]
            return(out)
        })
$JPM
[1] 264 284  69 283

$KFT
[1] 107 107  56

I don’t see what I need to change about this function to ultimately come to the total sum of the losing streak. However I tweak/change the function, I get the same result or an error. The ifelse function confuses me, because it seems the obvious part of the function to change, yet doesn't result in any change. What obvious point am I missing?

What is streak in Python?

In this tutorial, we're going to learn how to calculate streaks in Python using the pandas library and visualize them using Matplotlib. A streak is when several events happen in a row consecutively.

This will work:

FUN <- function(x, negate = FALSE, na.rm = FALSE) {
    rles <- rle(x > 0)
    if(negate) {
        max(rles$lengths[!rles$values], na.rm = na.rm)
    } else {
        max(rles$lengths[rles$values], na.rm = na.rm)
    }
}
wins <- lapply(split(subRes[,2],subRes[,1]), FUN)
loses <- lapply(split(subRes[,2],subRes[,1]), FUN, negate = TRUE)

Giving this:

> wins
$JPM
[1] 4

$KFT
[1] 3
> loses
$JPM
[1] 2

$KFT
[1] 2

or:

> sapply(split(subRes[,2],subRes[,1]), FUN)
JPM KFT 
  4   3
> sapply(split(subRes[,2],subRes[,1]), FUN, negate = TRUE)
JPM KFT 
  2   2

You were close, but you needed to apply rle() to each element of your list separately, and also convert TradeResult.Currency. to a logical vector depending indicating above 0 or not. Our function FUN returns just the lengths component of the object returned by rle, and we apply max() to this vector of lengths to find the longest winning run.

Note that here split isn't necessary, and you can use the other subset-by-factor-and-apply-function functions (tapply, aggregate, etc) here:

> with(subRes, aggregate(`TradeResult.Currency.`, 
+                        by = list(Instrument = Instrument), FUN))
  Instrument x
1        JPM 4
2        KFT 3
> with(subRes, tapply(`TradeResult.Currency.`, Instrument, FUN))
JPM KFT 
  4   3

The reason the earlier version wasn't right, was because if you had a longer series of losses than wins (longer series of negative values), would result in the length of the losses series being selected.

The modified function adds a 'negate' argument to swap the meaning of the test. If we want wins, we leave TRUE and FALSE in $values as they are. If we want losses, we swap TRUE and FALSE. We can then use this $values component to select only the runs that correspond to wins (negate = TRUE) or the runs that correspond to losses (negate = FALSE).

Calculating a consecutive streak in data

Tags:

r

Edit 19 January 2011

Jos

People also ask

1 Answers

Gavin Simpson

Recent Activity

Donate For Us

Calculating a consecutive streak in data

Tags:

r

Edit 19 January 2011

Jos

People also ask

1 Answers

Gavin Simpson

Related questions

Recent Activity

Donate For Us