I'm trying to count consecutive up days in equity return data; so if a positive day is 1 and a negative is 0, a list y=[0,0,1,1,1,0,0,1,0,1,1] should return z=[0,0,1,2,3,0,0,1,0,1,2].
I've come to a solution which has few lines of code, but is very slow:
import pandas y = pandas.Series([0,0,1,1,1,0,0,1,0,1,1])  def f(x):     return reduce(lambda a,b:reduce((a+b)*b,x)  z = pandas.expanding_apply(y,f)  I'm guessing I'm looping through the whole list y too many times. Is there a nice Pythonic way of achieving what I want while only going through the data once? I could write a loop myself but wondering if there's a better way.
Using the size() or count() method with pandas. DataFrame. groupby() will generate the count of a number of occurrences of data present in a particular column of the dataframe.
>>> y = pandas.Series([0,0,1,1,1,0,0,1,0,1,1])   The following may seem a little magical, but actually uses some common idioms: since pandas doesn't yet have nice native support for a contiguous groupby, you often find yourself needing something like this.
>>> y * (y.groupby((y != y.shift()).cumsum()).cumcount() + 1) 0     0 1     0 2     1 3     2 4     3 5     0 6     0 7     1 8     0 9     1 10    2 dtype: int64   Some explanation: first, we compare y against a shifted version of itself to find when the contiguous groups begin:
>>> y != y.shift() 0      True 1     False 2      True 3     False 4     False 5      True 6     False 7      True 8      True 9      True 10    False dtype: bool   Then (since False == 0 and True == 1) we can apply a cumulative sum to get a number for the groups:
>>> (y != y.shift()).cumsum() 0     1 1     1 2     2 3     2 4     2 5     3 6     3 7     4 8     5 9     6 10    6 dtype: int32   We can use groupby and cumcount to get us an integer counting up in each group:
>>> y.groupby((y != y.shift()).cumsum()).cumcount() 0     0 1     1 2     0 3     1 4     2 5     0 6     1 7     0 8     0 9     0 10    1 dtype: int64   Add one:
>>> y.groupby((y != y.shift()).cumsum()).cumcount() + 1 0     1 1     2 2     1 3     2 4     3 5     1 6     2 7     1 8     1 9     1 10    2 dtype: int64   And finally zero the values where we had zero to begin with:
>>> y * (y.groupby((y != y.shift()).cumsum()).cumcount() + 1) 0     0 1     0 2     1 3     2 4     3 5     0 6     0 7     1 8     0 9     1 10    2 dtype: int64 
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With