Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to loop through a python list in batch?

Tags:

python

list

A file contains 10000 lines with one entry in each line. I need to process the file but in batches (small chunks).

file = open("data.txt", "r")
data = file.readlines()
file.close()

total_count = len(data) # equals to ~10000 or less
max_batch = 50 # loop through 'data' with 50 entries at max in each loop.

for i in range(total_count):
     batch = data[i:i+50] # first 50 entries
     result = process_data(batch) # some time consuming processing on 50 entries
     if result == True:
           # add to DB that 50 entries are processed successfully!
     else:
           return 0 # quit the operation
           # later start again from the point it failed.
           # say 51st or 2560th or 9950th entry

What to do here so that next loop picks entries from 51 to 100th item and so on?

If somehow the operation is not successful and breaks in-between, then need to start loop again only from the batch where it failed (based on DB entry).

I'm not able to code a proper logic. Should I keep two lists? Or anything else?

like image 947
uwy59998 Avatar asked Jan 26 '17 07:01

uwy59998


1 Answers

l = [1,2,3,4,5,6,7,8,9,10]
batch_size = 3    

for i in range(0, len(l), batch_size):
    print(l[i:i+batch_size])
    # more logic here

>>> [1,2,3]
>>> [4,5,6]
>>> [7,8,9]
>>> [10}

I think this is the most straight-forward, readable approach. If you need to retry a certain batch, you can retry inside the loop (serial) or you can open a thread per batch - depends on the application...

like image 154
talsegal Avatar answered Oct 10 '22 23:10

talsegal