I'm writing a script that takes N records from a table, and processes the said records via multithreading.
Previously I simply used Order by RAND() in my SQL statement within each worker definition, and hoped that there would be no duplicates.
This sort of works (deduping is done later), however, I would like to make my script more efficient by:
1) querying the table once, extract N records, and assign them to a list
2) split the big list into ~equally-sized lists of Y lists, which can be accomplished via :
number_of_workers = 2
first_names = ['Steve', 'Jane', 'Sara', 'Mary','Jack']
def chunkify(lst,n):
return [lst[i::n] for i in xrange(n)]
list1 = chunkify(first_names, number_of_workers)
print list1
3) When defining the worker function in multithreading, pass on a different sublist to each worker. Note - the number of workers (and parts I want to split the query result into) is defined at the beginning of the function.
However, as I'm fairly new to Python, I have no idea how to pass on each sublist to a separate worker (or is it even doable?)
Any help, other suggestions, etc. would be much appreciated!
Example of multithreading code is below. How would I use
import threading
import random
def worker():
assign sublistN to worker N
print sublistN
threads = []
for i in range(number_of_workers):
print i
print ""
t = threading.Thread(target=worker)
threads.append(t)
t.start()
Thank you in advance!
The easiest way to split list into equal sized chunks is to use a slice operator successively and shifting initial and final position by a fixed number.
Usually, we use a comma to separate three items or more in a list. However, if one or more of these items contain commas, then you should use a semicolon, instead of a comma, to separate the items and avoid potential confusion.
In the same multithreaded process in a shared-memory multiprocessor environment, each thread in the process can run concurrently on a separate processor, resulting in parallel execution, which is true simultaneous execution.
Two things:
First, take a look at the Queue object. You don't even need to split the lists apart yourself this way. It's used for splitting a collection of objects between multiple threads (there's also a multi-process varient, which is where I'm getting to). The docs contain very good examples that fit your requirements.
Second, unless your workers involve waiting on things such as IO, network requests etc. threading in python is no quicker (probably slower actually) than processing sequentially. Threading does not make use of multi-processing, only one thread is ever running at one time. If this is your case, you'll probably want Multiprocessing which actually spins up a whole new python process for working. You've got similar tools such as queues in here.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With