Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the most effective way to incremente a large number of values in Python?

Tags:

python

Okay, sorry if my problem seems a bit rough. I'll try to explain it in a figurative way, I hope this is satisfactory.

10 children.
5 boxes.
Each child chooses three boxes.
Each box is opened:
- If it contains something, all children selected this box gets 1 point
- Otherwise, nobody gets a point.

My question is about what I put in bold. Because in my code, there are lots of kids and lots of boxes.

Currently, I proceed as follows:

children = {"child_1" : 0, ... , "child_10": 0}

gp1 = ["child_3", "child_7", "child_10"] #children who selected the box 1
...
gp5 = ["child_2", "child_5", "child_8", "child_10"]

boxes = [(0,gp1), (0,gp2), (1,gp3), (1,gp4), (0,gp5)]

for box in boxes:
    if box[0] == 1: #something inside
        for child in box[1]:
            children[child] += 1

I worry mainly about the for loop that assigns each child an extra point. Because in my final code, I have many many children, I fear that doing so would slow the program too.

Is there a more efficient way for all children of the same group may have their point faster?

like image 986
Delgan Avatar asked Sep 01 '13 18:09

Delgan


2 Answers

  1. Represent children as indices into arrays, not as strings:

    childrenScores = [0] * 10
    gp1 = [2,6,9] # children who selected box 1
    ...
    gp5 = [1,4,7,9]
    
    boxes = [(0,gp1), (0,gp2), (1,gp3), (1,gp4), (0,gp5)]
    
  2. Then, you can store childrenScores as a NumPy array and use advanced indexing:

    childrenScores = np.zeros(10, dtype=int)
    ...
    for box in boxes:
        if box[0]:
            childrenScores[box[1]] += 1 # NumPy advanced indexing
    

    This still involves a loop somewhere, but the loop is deep inside NumPy instead, which should provide a meaningful speedup.

like image 141
nneonneo Avatar answered Nov 16 '22 04:11

nneonneo


The only speed up that I can think of is to use numpy arrays and stream the sum operation.

children[child] += np.ones(len(children[child]))

You should benchmark the operation and see if that is too slow for your business case.

like image 39
fabrizioM Avatar answered Nov 16 '22 02:11

fabrizioM