Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Split list into N lists, and assign each list to a worker in multithreading

I'm writing a script that takes N records from a table, and processes the said records via multithreading.

Previously I simply used Order by RAND() in my SQL statement within each worker definition, and hoped that there would be no duplicates.

This sort of works (deduping is done later), however, I would like to make my script more efficient by:

1) querying the table once, extract N records, and assign them to a list

2) split the big list into ~equally-sized lists of Y lists, which can be accomplished via :

number_of_workers = 2
first_names = ['Steve', 'Jane', 'Sara', 'Mary','Jack']
def chunkify(lst,n):
     return [lst[i::n] for i in xrange(n)]
list1 = chunkify(first_names, number_of_workers)
print list1

3) When defining the worker function in multithreading, pass on a different sublist to each worker. Note - the number of workers (and parts I want to split the query result into) is defined at the beginning of the function. However, as I'm fairly new to Python, I have no idea how to pass on each sublist to a separate worker (or is it even doable?)
Any help, other suggestions, etc. would be much appreciated!

Example of multithreading code is below. How would I use

import threading
import random

def worker():

    assign sublistN to worker N 
    print sublistN

threads = []
for i in range(number_of_workers):
    print i
    print ""
    t = threading.Thread(target=worker)
    threads.append(t)
    t.start()

Thank you in advance!

like image 938
FlyingZebra1 Avatar asked Dec 20 '17 07:12

FlyingZebra1


People also ask

How do you split a list into evenly sized chunks?

The easiest way to split list into equal sized chunks is to use a slice operator successively and shifting initial and final position by a fixed number.

How do you separate a list?

Usually, we use a comma to separate three items or more in a list. However, if one or more of these items contain commas, then you should use a semicolon, instead of a comma, to separate the items and avoid potential confusion.

Can you make multiple thread to execute same instructions?

In the same multithreaded process in a shared-memory multiprocessor environment, each thread in the process can run concurrently on a separate processor, resulting in parallel execution, which is true simultaneous execution.


1 Answers

Two things:

First, take a look at the Queue object. You don't even need to split the lists apart yourself this way. It's used for splitting a collection of objects between multiple threads (there's also a multi-process varient, which is where I'm getting to). The docs contain very good examples that fit your requirements.

Second, unless your workers involve waiting on things such as IO, network requests etc. threading in python is no quicker (probably slower actually) than processing sequentially. Threading does not make use of multi-processing, only one thread is ever running at one time. If this is your case, you'll probably want Multiprocessing which actually spins up a whole new python process for working. You've got similar tools such as queues in here.

like image 113
SCB Avatar answered Oct 26 '22 22:10

SCB