Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Increment first n list elements given a condition

I have a list for example

l = [10, 20, 30, 40, 50, 60]

I need to increment the first n elements of the list given a condition. The condition is independent of the list. For example if n = 3, the list l should become :

l = [11, 21, 31, 40, 50, 60]

I understand that I can do it with a for loop on each element of the list. But I need to do such operation around 150 million times. So, I am looking for a faster method to do this. Any help is highly appreciated. Thanks in advance

like image 932
Dreams Avatar asked Sep 27 '22 00:09

Dreams


3 Answers

Here's an operation-aggregating implementation in NumPy:

initial_array = # whatever your l is, but as a NumPy array
increments = numpy.zeros_like(initial_array)
...
# every time you want to increment the first n elements
if n:
    increments[n-1] += 1
...
# to apply the increments
initial_array += increments[::-1].cumsum()[::-1]

This is O(ops + len(initial_array)), where ops is the number of increment operations. Unless you're only doing a small number of increments over a very small portion of the list, this should be much faster. Unlike the naive implementation, it doesn't let you retrieve element values until the increments are applied; if you need to do that, you might need a solution based on a BST or BST-like structure to track increments.

like image 84
user2357112 supports Monica Avatar answered Oct 23 '22 13:10

user2357112 supports Monica


m - queries count, n - list to increment length, O(n + m) algorithm idea:
since you only have to increment from start to some k-th element you will get ranges of increments. Let our increment be pair (up to position, increment by). Example:
(1, 2) - increment positions 0 and 1 by 2
If we are trying to calculate value at position k then we should add increments that have positions greater or equal than k to current value at position k. How we can quickly calculate sum of increments that have positions greater or equal than k? We can start calculating values from the back of the list and then remember sum of increments.
Proof of concept:

# list to increment
a = [1, 2, 5, 1, 6]
# (up to and including k-th index, increment by value)
queries = [(1, 2), (0, 10), (3, 11), (4, 3)]

# decribed algorithm implementation
increments = [0]*len(a)
for position, inc in queries:
    increments[position] += inc

got = list(a)
increments_sum = 0
for i in xrange(len(increments) -1, -1, -1):
    increments_sum += increments[i]
    got[i] += increments_sum


# verify that solution is correct using slow but correct algorithm
expected = list(a)
for position, inc in queries:
    for i in xrange(position + 1):
        expected[i] += inc

print 'Expected: ', expected
print 'Got:      ', got

output:

Expected:  [27, 18, 19, 15, 9]
Got:       [27, 18, 19, 15, 9]
like image 35
Žilvinas Rudžionis Avatar answered Oct 23 '22 14:10

Žilvinas Rudžionis


You can create a simple data structure on top of your list which stores the start and end range of each increment operation. The start would be 0 in your case so you can just store the end.

This way you don't have to actually traverse the list to increment the elements, but you only retain that you performed increments on ranges for example {0 to 2} and {0 to 3}. Furthermore, you can also collate some operations, so that if multiple operations increment until the same index, you only need to store one entry.

The worst case complexity of this solution is O(q + g x qlogq + n) where g is the number of get operations, q is the number of updates and n is the length of the list. Since we can have at most n distinct endings for the intervals this reduces to O(q + nlogn + n) = O(q + nlogn). A naive solution using an update for each query would be O(q * l) where l (the length of a query) could be up to the size of n giving O(q * n). So we can expect this solution to be better when q > log n.

Working python example below:

def RangeStructure(object):

  def __init__(self, l):
    self.ranges = collections.defaultdict(int)
    self.l = l

  def incToPosition(self, k):
    self.ranges[k] += 1

  def get(self):
    res = self.l
    sorted_keys = sorted(self.ranges)
    last = len(sorted_keys) - 1                                                                                                                                                                                                                
    to_add = 0
    while last >= 0:
        start = 0 if last < 1 else sorted_keys[last - 1]
        end = sorted_keys[last]
        to_add += self.ranges[end]
        for i in range(start, end):
            res[i] += to_add
        last -= 1
    return res

rs = RangeStructure([10, 20, 30, 40, 50, 60])
rs.incToPosition(2)
rs.incToPosition(2)
rs.incToPosition(3)
rs.incToPosition(4)
print rs.get()

And an explanation:

  1. after the inc operations ranges will contain (start, end, inc) tuples of the form (0, 2, 2), (0, 3, 1), (0, 4, 1); these will be represented in the dict as { 2:2, 3:1, 4:1} since the start is always 1 and can be omitted

  2. during the get operation, we ensure that we only operate on any list element once; we sort the ranges in increasing order of their end point, and traverse them in reverse order updating the contained list elements and the sum (to_add) to be added to subsequent ranges

This prints, as expected:

[14, 24, 32, 41, 50, 60]
like image 2
paul-g Avatar answered Oct 23 '22 14:10

paul-g