Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Caveats to be aware of when using threading in Python?

I'm quite new to threading in Python and have a couple of beginner questions.

When starting more than say fifty threads using the Python threading module I start getting MemoryError. The threads themselves are very slim and not very memory hungry, so it seems like it is the overhead of the threading that causes the memory issues.

  • Is there something I can do to increase the memory capacity or otherwise make Python allow for a larger number of threads?
  • What is the maximum number of threads you've been able to run in your Python code using the threading module? Did you do any tricks to achieve that number?
  • Are there any other caveats to be aware of when using the threading module?
like image 721
knorv Avatar asked Feb 26 '23 22:02

knorv


2 Answers

Your question cannot be answered in a general way, as good usage of threading always depends on concrete problem to be solved. You also do not tell us, which Python version you are using, so I assume you use the "default" CPython and not IronPython or something like that. To give you some hints and ideas to further think about your problem:

  • Why do you need so much treads? Your machine will probably not be able to run them in parallel anyway.
  • Have a look at Stackless Python. Don't know the current status of the project, but I think it was designed for that kind of problems.
  • The global interpreter lock prevents pure Python code from really running in parallel. But C methods can be run in parallel, so in real life it's sometimes hard to guess, how Python will behave regarding parallelization.
  • Python has many good libraries. Have a look whether one of them already has a solution for your design problem. If your problem is network related, have a look at Twisted for example.
like image 143
Achim Avatar answered Mar 01 '23 12:03

Achim


The Global Interpreter Lock is known to have a strong impact on the performance limitations of standard CPython. Thus the multiprocessing module notes:

multiprocessing is a package that supports spawning processes using an API similar to the threading module. The multiprocessing package offers both local and remote concurrency, effectively side-stepping the Global Interpreter Lock by using subprocesses instead of threads. Due to this, the multiprocessing module allows the programmer to fully leverage multiple processors on a given machine. It runs on both Unix and Windows.

The GIL probably isn't the cause of your MemoryErrors, but it is something to be aware of.

like image 30
msw Avatar answered Mar 01 '23 14:03

msw