are there any limitations on the number of locks a python program can create?

Tags:

I have a single-threaded python3 program that I'm trying to convert to use many threads. I have a tree-like data structure that gets read from and written to. Potentially many threads will want to read and write at the same time.

One obvious way about this is to have a single lock for the entire data structure: no one can read while a write is happening, no more than one write can happen at a time, and no write can happen when there are pending reads.

However, I'd like to make the locking more fine-grained for greater performance. It's a full 16-ary tree, and when fully populated has about 5 to 6 million leafs (mostly well-balanced in practice, but no guarantee). If I wanted the finest-grained locking, I could lock the parents of the leafs. That would mean over 100 thousand locks.

I must admit, I haven't tried this yet. But I thought I'd ask first: are there any hardware limitations or performance reasons that should stop me from creating so many lock objects? That is, should I consider just locking down to, say, depth 2 from the root (256 locks)?

Thanks for any insights.

EDIT:

More details:

I don't know how many cores yet as we're still experimenting as to just how much computing power we'll need, but I'd hazard a guess that just a handful of cores will be used.

I'm aiming for around 50,000 threads. There's async I/O, and one thread per socket. During a bootstrapping phase of the code, as many threads as possible will be running simultaneously (as limited by hardware), but that's a one-time cost. The one we're more interested in is once things are up and running. At that point, I'd hazard a guess that only several thousand per second are running. I need to measure the response time, but I'd guess it's around 10ms per wake period. That's a few 10s of threads active at a time (on average).

Now that I write that out, maybe that's the answer to my question. If I only need a few 10s of threads reading or writing at a time, then I don't really need that fine-grained locking on the tree.

486

asked Jun 02 '16 23:06

Armadillo Jim

2 Answers

Premature Optimization

This is a classic example of premature optimization. Without knowing how much time your threads spend blocking, presumably waiting for other writes to happen, it's unclear what you have to gain from creating the added complexity of managing thousands of locks.

The Global Interpreter Lock

Threading itself can be a premature optimization. Is your task easily threadable? Can many threads safely work in parallel? Tasks that require a large amount of shared state (i.e. many and frequent locks) are typically poor candidates for high thread counts. In python, you're likely to see even less benefit because of the GIL. Are your threads doing a lot of IO, or calling out to external applications, or using python modules written in C that properly releases the GIL? If not, threading might not actually give you any benefit. You can sidestep the GIL by using the multiprocessing module, but there's an overhead to passing locks and writes across process boundaries, and ironically, it might make your application much slower

Queues

Another option is to use a write queue. If threads don't actually need to share state, but they all need to write to the same object (i.e. very little reading from that object), you can simply add the writes to a queue and have a single thread process the writes, with no need for any locks.

191

answered Oct 05 '22 02:10

Brendan Abel

Echoing Brendan Abel: avoid premature optimization. If I have 16 cores, there can be no more than 16 parallel accesses (reads/writes). There's no point having a kajillion locks hanging around. A limited number of locks will suffice. The question then becomes: if I have random access on a set of resources, what is the chance two cores will try to access the same resource at the same time?

For example, if I have 16 cores trying to access 16 resources at random, the chance of one blocking another is high. That is, there are very few permutations of 16 objects versus 16 independent choices from 16 objects: 16! vs 16^16. The chance of not blocking is around 1 in 1 million. (Note: my earlier comment below the question was off.) On the other hand, if I have 16 cores accessing 256 resources at random, the chance of blocking falls to about 38%. By the time you get to 16 cores and 4096 resources, the chance of blocking is under 3%.

This is related to the birthday paradox.

Note: I'm assuming here that one thread blocking another is undesirable, that the wait is unacceptable. But, as above, the thing to do is measure. Don't put more engineering effort into optimizing something that just isn't needed.

answered Oct 05 '22 04:10

Armadillo Jim

Related questions
                            
                                How to include post install script in python setuptools
                            
                                Render_to_string and response.content.decode() not matching
                            
                                Getting a slice of a numpy ndarray (for arbitary dimensions)
                            
                                Is it possible to select pandas dataframe with row indices and column names?
                            
                                How to use sum and order by in SQLAlchemy query
                            
                                Unable to import grequests for AWS Lambda
                            
                                How to recognize windows 10 using Python? [closed]
                            
                                Is this Python "static variable" hack ok to use? [closed]
                            
                                Go c-shared library callback into other languages
                            
                                Convert fraction to string with repeating decimal places in brackets
                            
                                Django GenericRelation still does not enable reverse querying from GenericForeignKey
                            
                                How to read excel cell and retain or detect its format in Python
                            
                                Resetting paused scrape, Scrapy
                            
                                Cython build can't find C++11 STL files - but only when called from setup.py
                            
                                OpenCV affine transformation won't perform
                            
                                How do I get TensorFlow's 'import_graph_def' to return Tensors
                            
                                element-wise operations of matrix in python
                            
                                How to use pelican to generate a hierarchical website, not a blog
                            
                                error :document must be an instance of dict, bson.son.SON, bson.raw_bson.RawBSONDocument, or a type that inherits from collections.MutableMapping
                            
                                Finding all repeated substrings in a string and how often they appear

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

are there any limitations on the number of locks a python program can create?

Tags:

python

multithreading

Armadillo Jim

People also ask

2 Answers

Premature Optimization

The Global Interpreter Lock

Queues

Brendan Abel

Armadillo Jim

Recent Activity

Donate For Us