Split list into N lists, and assign each list to a worker in multithreading

Tags:

I'm writing a script that takes N records from a table, and processes the said records via multithreading.

Previously I simply used Order by RAND() in my SQL statement within each worker definition, and hoped that there would be no duplicates.

This sort of works (deduping is done later), however, I would like to make my script more efficient by:

1) querying the table once, extract N records, and assign them to a list

2) split the big list into ~equally-sized lists of Y lists, which can be accomplished via :

number_of_workers = 2
first_names = ['Steve', 'Jane', 'Sara', 'Mary','Jack']
def chunkify(lst,n):
     return [lst[i::n] for i in xrange(n)]
list1 = chunkify(first_names, number_of_workers)
print list1

3) When defining the worker function in multithreading, pass on a different sublist to each worker. Note - the number of workers (and parts I want to split the query result into) is defined at the beginning of the function. However, as I'm fairly new to Python, I have no idea how to pass on each sublist to a separate worker (or is it even doable?)
Any help, other suggestions, etc. would be much appreciated!

Example of multithreading code is below. How would I use

import threading
import random

def worker():

    assign sublistN to worker N 
    print sublistN

threads = []
for i in range(number_of_workers):
    print i
    print ""
    t = threading.Thread(target=worker)
    threads.append(t)
    t.start()

Thank you in advance!

938

asked Dec 20 '17 07:12

FlyingZebra1

1 Answers

Two things:

First, take a look at the Queue object. You don't even need to split the lists apart yourself this way. It's used for splitting a collection of objects between multiple threads (there's also a multi-process varient, which is where I'm getting to). The docs contain very good examples that fit your requirements.

Second, unless your workers involve waiting on things such as IO, network requests etc. threading in python is no quicker (probably slower actually) than processing sequentially. Threading does not make use of multi-processing, only one thread is ever running at one time. If this is your case, you'll probably want Multiprocessing which actually spins up a whole new python process for working. You've got similar tools such as queues in here.

113

answered Oct 26 '22 22:10

SCB

Related questions
                            
                                OCR: check if letter is in (string) of image (Opencv, Python, Tesseract)
                            
                                binding key event in python using ctypes function
                            
                                Get Python's LIB path
                            
                                What is the difference between using ZMQ PUB with .connect() or .bind() methods?
                            
                                Scikit-learn does not work with string value on KNN
                            
                                python module 'pandas' has no attribute 'plotting'
                            
                                Quote only the required columns using pandas to_csv
                            
                                Tensorflow Dataset extremely slow compared to queues
                            
                                Accepting all changes in a MS Word document by using Python
                            
                                ValueError: Shape must be rank 1 but is rank 0 for 'ROIAlign/Crop' (op: 'CropAndResize') with input shapes: [2,360,475,3], [1,4], [], [2]
                            
                                Keyboard interrupt tensorflow run and save at that point
                            
                                Symbolic integration slow with SymPy
                            
                                Where is `*` documented in tensorflow?
                            
                                More Complex Syntax Highlighting for Python in PyCharm?
                            
                                How do I configure Aptana IDE (Eclipse) to work with pipenv?
                            
                                In what order, OpenCV's HoughLines lists the detected lines in the [rho,theta] matrix?
                            
                                Pandas DataFrame : groupby then transpose
                            
                                Python 3.7: Utility of Dataclasses and SimpleNameSpace
                            
                                What is Debugger PIN when I run the flask app python
                            
                                Is collections.defaultdict thread-safe?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Split list into N lists, and assign each list to a worker in multithreading

Tags:

python

list

multithreading

FlyingZebra1

People also ask

1 Answers

SCB

Recent Activity

Donate For Us