Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Compute the cumulative sum of a list until a zero appears

Tags:

I have a (long) list in which zeros and ones appear at random:

list_a = [1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1] 

I want to get the list_b

  • sum of the list up to where 0 appears
  • where 0 appears, retain 0 in the list

    list_b = [1, 2, 3, 0, 1, 2, 0, 1, 0, 1, 2, 3] 

I can implement this as follows:

list_b = [] for i, x in enumerate(list_a):     if x == 0:         list_b.append(x)     else:         sum_value = 0         for j in list_a[i::-1]:             if j != 0:                 sum_value += j             else:                 break         list_b.append(sum_value) print(list_b) 

but the actual list's length is very long.

So, I want to improve code for high speed. (if it is not readable)

I change the code like this:

from itertools import takewhile list_c = [sum(takewhile(lambda x: x != 0, list_a[i::-1])) for i, d in enumerate(list_a)] print(list_c) 

But it is not fast enough. How can I do it in more efficient way?

like image 702
Hiroyuki Taniichi Avatar asked Feb 15 '18 10:02

Hiroyuki Taniichi


People also ask

How do you do a cumulative sum list in Python?

We declare an empty list cum_list to which we will append elements to form the cumulative sum list. Initialize a sum variable sm=0. Start iterating over the input list, with each iteration we increment the sum value to previous value+ the current element. On each iteration, the sum value is appended to the cum_list.

How do you find the cumulative sum in pandas?

The cumsum() method returns a DataFrame with the cumulative sum for each row. The cumsum() method goes through the values in the DataFrame, from the top, row by row, adding the values with the value from the previous row, ending up with a DataFrame where the last row contains the sum of all values for each column.

How do you find the cumulative sum in Java?

Keep a cumulative sum, and update that sum with each element. After you update the sum, replace the element with the sum. int[] out = new int[ARRAY SIZE HERE]; You should also note that in the method signature you are returning an array of integers, and the variable total is an integer, not an array of integers.


2 Answers

You're overthinking this.

Option 1
You can just iterate over the indices and update accordingly (computing the cumulative sum), based on whether the current value is 0 or not.

data = [1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1]  for i in range(1, len(data)):     if data[i]:           data[i] += data[i - 1]  

That is, if the current element is non-zero, then update the element at the current index as the sum of the current value, plus the value at the previous index.

print(data) [1, 2, 3, 0, 1, 2, 0, 1, 0, 1, 2, 3] 

Note that this updates your list in place. You can create a copy in advance if you don't want that - new_data = data.copy() and iterate over new_data in the same manner.


Option 2
You can use the pandas API if you need performance. Find groups based on the placement of 0s, and use groupby + cumsum to compute group-wise cumulative sums, similar to above:

import pandas as pd  s = pd.Series(data)     data = s.groupby(s.eq(0).cumsum()).cumsum().tolist() 

print(data) [1, 2, 3, 0, 1, 2, 0, 1, 0, 1, 2, 3] 

Performance

First, the setup -

data = data * 100000 s = pd.Series(data) 

Next,

%%timeit new_data = data.copy() for i in range(1, len(data)):     if new_data[i]:           new_data[i] += new_data[i - 1]  328 ms ± 4.09 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 

And, timing the copy separately,

%timeit data.copy() 8.49 ms ± 17.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 

So, the copy doesn't really take much time. Finally,

%timeit s.groupby(s.eq(0).cumsum()).cumsum().tolist() 122 ms ± 1.69 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) 

The pandas approach is conceptually linear (just like the other approaches) but faster by a constant degree because of the implementation of the library.

like image 156
cs95 Avatar answered Sep 20 '22 05:09

cs95


If you want a compact native Python solution that is probably the most memory efficient, although not the fastest (see the comments), you could draw extensively from itertools:

>>> from itertools import groupby, accumulate, chain >>> list(chain.from_iterable(accumulate(g) for _, g in groupby(list_a, bool))) [1, 2, 3, 0, 1, 2, 0, 1, 0, 1, 2, 3] 

The steps here are: group the list into sublists based on presence of 0 (which is falsy), take the cumulative sum of the values within each sublist, flatten the sublists.

As Stefan Pochmann comments, if your list is binary in contents (like consisting of only 1s and 0s only) then you don't need to pass a key to groupby() at all and it will fall back on the identity function. This is ~30% faster than using bool for this case:

>>> list(chain.from_iterable(accumulate(g) for _, g in groupby(list_a))) [1, 2, 3, 0, 1, 2, 0, 1, 0, 1, 2, 3] 
like image 30
Chris_Rands Avatar answered Sep 19 '22 05:09

Chris_Rands