Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Counting consecutive positive values in Python/pandas array

Tags:

python

pandas

I'm trying to count consecutive up days in equity return data; so if a positive day is 1 and a negative is 0, a list y=[0,0,1,1,1,0,0,1,0,1,1] should return z=[0,0,1,2,3,0,0,1,0,1,2].

I've come to a solution which has few lines of code, but is very slow:

import pandas y = pandas.Series([0,0,1,1,1,0,0,1,0,1,1])  def f(x):     return reduce(lambda a,b:reduce((a+b)*b,x)  z = pandas.expanding_apply(y,f) 

I'm guessing I'm looping through the whole list y too many times. Is there a nice Pythonic way of achieving what I want while only going through the data once? I could write a loop myself but wondering if there's a better way.

like image 527
alex314159 Avatar asked Dec 23 '14 19:12

alex314159


People also ask

How do you count occurrences in Pandas?

Using the size() or count() method with pandas. DataFrame. groupby() will generate the count of a number of occurrences of data present in a particular column of the dataframe.


1 Answers

>>> y = pandas.Series([0,0,1,1,1,0,0,1,0,1,1]) 

The following may seem a little magical, but actually uses some common idioms: since pandas doesn't yet have nice native support for a contiguous groupby, you often find yourself needing something like this.

>>> y * (y.groupby((y != y.shift()).cumsum()).cumcount() + 1) 0     0 1     0 2     1 3     2 4     3 5     0 6     0 7     1 8     0 9     1 10    2 dtype: int64 

Some explanation: first, we compare y against a shifted version of itself to find when the contiguous groups begin:

>>> y != y.shift() 0      True 1     False 2      True 3     False 4     False 5      True 6     False 7      True 8      True 9      True 10    False dtype: bool 

Then (since False == 0 and True == 1) we can apply a cumulative sum to get a number for the groups:

>>> (y != y.shift()).cumsum() 0     1 1     1 2     2 3     2 4     2 5     3 6     3 7     4 8     5 9     6 10    6 dtype: int32 

We can use groupby and cumcount to get us an integer counting up in each group:

>>> y.groupby((y != y.shift()).cumsum()).cumcount() 0     0 1     1 2     0 3     1 4     2 5     0 6     1 7     0 8     0 9     0 10    1 dtype: int64 

Add one:

>>> y.groupby((y != y.shift()).cumsum()).cumcount() + 1 0     1 1     2 2     1 3     2 4     3 5     1 6     2 7     1 8     1 9     1 10    2 dtype: int64 

And finally zero the values where we had zero to begin with:

>>> y * (y.groupby((y != y.shift()).cumsum()).cumcount() + 1) 0     0 1     0 2     1 3     2 4     3 5     0 6     0 7     1 8     0 9     1 10    2 dtype: int64 
like image 95
DSM Avatar answered Sep 26 '22 01:09

DSM