The setup I have written a pretty complex piece of software in Python (on a Windows PC). My software starts basically two Python interpreter shells. The first shell starts up (I suppose) when you double click the <code>main.py</code> file. Within that shell, other threads are started in the following way: <pre class="prettyprint"><code> # Start TCP_thread TCP_thread = threading.Thread(name = 'TCP_loop', target = TCP_loop, args = (TCPsock,)) TCP_thread.start() # Start UDP_thread UDP_thread = threading.Thread(name = 'UDP_loop', target = UDP_loop, args = (UDPsock,)) TCP_thread.start() </code></pre> The <code>Main_thread</code> starts a <code>TCP_thread</code> and a <code>UDP_thread</code>. Although these are separate threads, they all run within one single Python shell. The <code>Main_thread</code>also starts a subprocess. This is done in the following way: <pre class="prettyprint"><code>p = subprocess.Popen(['python', mySubprocessPath], shell=True) </code></pre> From the Python documentation, I understand that this subprocess is running simultaneously (!) in a separate Python interpreter session/shell. The <code>Main_thread</code>in this subprocess is completely dedicated to my GUI. The GUI starts a <code>TCP_thread</code> for all its communications. I know that things get a bit complicated. Therefore I have summarized the whole setup in this figure: <img src="https://i.stack.imgur.com/XFN5o.png" alt="enter image description here"> <hr> I have several questions concerning this setup. I will list them down here: Question 1 [Solved] Is it true that a Python interpreter uses only one CPU core at a time to run all the threads? In other words, will the <code>Python interpreter session 1</code> (from the figure) run all 3 threads (<code>Main_thread</code>, <code>TCP_thread</code> and <code>UDP_thread</code>) on one CPU core? Answer: yes, this is true. The GIL (Global Interpreter Lock) ensures that all threads run on one CPU core at a time. Question 2 [Not yet solved] Do I have a way to track which CPU core it is? Question 3 [Partly solved] For this question we forget about threads, but we focus on the subprocess mechanism in Python. Starting a new subprocess implies starting up a new Python interpreter instance. Is this correct? Answer: Yes this is correct. At first there was some confusion about whether the following code would create a new Python interpreter instance: <pre class="prettyprint"><code> p = subprocess.Popen(['python', mySubprocessPath], shell = True) </code></pre> The issue has been clarified. This code indeed starts a new Python interpreter instance. Will Python be smart enough to make that separate Python interpreter instance run on a different CPU core? Is there a way to track which one, perhaps with some sporadic print statements as well? Question 4 [New question] The community discussion raised a new question. There are apparently two approaches when spawning a new process (within a new Python interpreter instance): <pre class="prettyprint"><code> # Approach 1(a) p = subprocess.Popen(['python', mySubprocessPath], shell = True) # Approach 1(b) (J.F. Sebastian) p = subprocess.Popen([sys.executable, mySubprocessPath]) # Approach 2 p = multiprocessing.Process(target=foo, args=(q,)) </code></pre> The second approach has the obvious downside that it targets just a function - whereas I need to open up a new Python script. Anyway, are both approaches similar in what they achieve?

<blockquote> Q: Is it true that a Python interpreter uses only one CPU core at a time to run all the threads? </blockquote> No. GIL and CPU affinity are unrelated concepts. GIL can be released during blocking I/O operations, long CPU intensive computations inside a C extension anyway. If a thread is blocked on GIL; it is probably not on any CPU core and therefore it is fair to say that pure Python multithreading code may use only one CPU core at a time on CPython implementation. <blockquote> Q: In other words, will the Python interpreter session 1 (from the figure) run all 3 threads (Main_thread, TCP_thread and UDP_thread) on one CPU core? </blockquote> I don't think CPython manages CPU affinity implicitly. It is likely relies on OS scheduler to choose where to run a thread. Python threads are implemented on top of real OS threads. <blockquote> Q: Or is the Python interpreter able to spread them over multiple cores? </blockquote> To find out the number of usable CPUs: <pre class="prettyprint"><code>>>> import os >>> len(os.sched_getaffinity(0)) 16 </code></pre> Again, whether or not threads are scheduled on different CPUs does not depend on Python interpreter. <blockquote> Q: Suppose that the answer to Question 1 is 'multiple cores', do I have a way to track on which core each thread is running, perhaps with some sporadic print statements? If the answer to Question 1 is 'only one core', do I have a way to track which one it is? </blockquote> I imagine, a specific CPU may change from one time-slot to another. You could look at something like <code>/proc/<pid>/task/<tid>/status</code> on old Linux kernels. On my machine, <code>task_cpu</code> can be read from <code>/proc/<pid>/stat</code> or <code>/proc/<pid>/task/<tid>/stat</code>: <pre class="prettyprint"><code>>>> open("/proc/{pid}/stat".format(pid=os.getpid()), 'rb').read().split()[-14] '4' </code></pre> For a current portable solution, see whether <code>psutil</code> exposes such info. You could restrict the current process to a set of CPUs: <pre class="prettyprint"><code>os.sched_setaffinity(0, {0}) # current process on 0-th core </code></pre> <blockquote> Q: For this question we forget about threads, but we focus on the subprocess mechanism in Python. Starting a new subprocess implies starting up a new Python interpreter session/shell. Is this correct? </blockquote> Yes. <code>subprocess</code> module creates new OS processes. If you run <code>python</code> executable then it starts a new Python interpeter. If you run a bash script then no new Python interpreter is created i.e., running <code>bash</code> executable does not start a new Python interpreter/session/etc. <blockquote> Q: Supposing that it is correct, will Python be smart enough to make that separate interpreter session run on a different CPU core? Is there a way to track this, perhaps with some sporadic print statements as well? </blockquote> See above (i.e., OS decides where to run your thread and there could be OS API that exposes where the thread is run). <blockquote> <code>multiprocessing.Process(target=foo, args=(q,)).start()</code> </blockquote> <code>multiprocessing.Process</code> also creates a new OS process (that runs a new Python interpreter). <blockquote> In reality, my subprocess is another file. So this example won't work for me. </blockquote> Python uses modules to organize the code. If your code is in <code>another_file.py</code> then <code>import another_file</code> in your main module and pass <code>another_file.foo</code> to <code>multiprocessing.Process</code>. <blockquote> Nevertheless, how would you compare it to p = subprocess.Popen(..)? Does it matter if I start the new process (or should I say 'python interpreter instance') with subprocess.Popen(..)versus multiprocessing.Process(..)? </blockquote> <code>multiprocessing.Process()</code> is likely implemented on top of <code>subprocess.Popen()</code>. <code>multiprocessing</code> provides API that is similar to <code>threading</code> API and it abstracts away details of communication between python processes (how Python objects are serialized to be sent between processes). If there are no CPU intensive tasks then you could run your GUI and I/O threads in a single process. If you have a series of CPU intensive tasks then to utilize multiple CPUs at once, either use multiple threads with C extensions such as <code>lxml</code>, <code>regex</code>, <code>numpy</code> (or your own one created using Cython) that can release GIL during long computations or offload them into separate processes (a simple way is to use a process pool such as provided by <code>concurrent.futures</code>). <blockquote> Q: The community discussion raised a new question. There are apparently two approaches when spawning a new process (within a new Python interpreter instance): <pre class="prettyprint"><code># Approach 1(a) p = subprocess.Popen(['python', mySubprocessPath], shell = True) # Approach 1(b) (J.F. Sebastian) p = subprocess.Popen([sys.executable, mySubprocessPath]) # Approach 2 p = multiprocessing.Process(target=foo, args=(q,)) </code></pre> </blockquote> "Approach 1(a)" is wrong on POSIX (though it may work on Windows). For portability, use "Approach 1(b)" unless you know you need <code>cmd.exe</code> (pass a string in this case, to make sure that the correct command-line escaping is used). <blockquote> The second approach has the obvious downside that it targets just a function - whereas I need to open up a new Python script. Anyway, are both approaches similar in what they achieve? </blockquote> <code>subprocess</code> creates new processes, any processes e.g., you could run a bash script. <code>multprocessing</code> is used to run Python code in another process. It is more flexible to import a Python module and run its function than to run it as a script. See Call python script with input with in a python script using subprocess.

Since you are using the <code>threading</code> module which is build up on <code>thread</code>. As the documentation suggests, it uses the ''POSIX thread implementation'' pthread of your OS. <ol> <li>The threads are managed by the OS instead of Python interpreter. So the answer will depend on the pthread library in your system. However, CPython uses GIL to prevent multiple threads from executing Python bytecodes simutanously. So they will be sequentialized. But still they can be separated to different cores, which depends on your pthread libs.</li> <li>Simplly use a debugger and attach it to your python.exe. For example the GDB thread command.</li> <li>Similar to question 1, the new process is managed by your OS and probably running on a different core. Use debugger or any process monitor to see it. For more details, go to the <code>CreatProcess()</code> documentation page.</li> </ol>

1, 2: You have three real threads, but in CPython they're limited by GIL , so, assuming they're running pure python, code you'll see CPU usage as if only one core used. 3: As said gdlmx it's up to OS to choose a core to run a thread on, but if you really need control, you can set process or thread affinity using native API via <code>ctypes</code>. Since you are on Windows, it would be like this: <pre class="prettyprint"><code># This will run your subprocess on core#0 only p = subprocess.Popen(['python', mySubprocessPath], shell = True) cpu_mask = 1 ctypes.windll.kernel32.SetProcessAffinityMask(p._handle, cpu_mask) </code></pre> I use here private <code>Popen._handle</code> for simplicty. The clean way would be<code>OpenProcess(p.tid)</code> etc. And yes, <code>subprocess</code> runs python like everything else in another new process.

On what CPU cores are my Python processes running?

Tags:

python

python-3.x

multithreading

multiprocessing

The setup

I have written a pretty complex piece of software in Python (on a Windows PC). My software starts basically two Python interpreter shells. The first shell starts up (I suppose) when you double click the main.py file. Within that shell, other threads are started in the following way:

    # Start TCP_thread
    TCP_thread = threading.Thread(name = 'TCP_loop', target = TCP_loop, args = (TCPsock,))
    TCP_thread.start()

    # Start UDP_thread
    UDP_thread = threading.Thread(name = 'UDP_loop', target = UDP_loop, args = (UDPsock,))
    TCP_thread.start()

The Main_thread starts a TCP_thread and a UDP_thread. Although these are separate threads, they all run within one single Python shell.

The Main_threadalso starts a subprocess. This is done in the following way:

p = subprocess.Popen(['python', mySubprocessPath], shell=True)

From the Python documentation, I understand that this subprocess is running simultaneously (!) in a separate Python interpreter session/shell. The Main_threadin this subprocess is completely dedicated to my GUI. The GUI starts a TCP_thread for all its communications.

I know that things get a bit complicated. Therefore I have summarized the whole setup in this figure:

enter image description here

I have several questions concerning this setup. I will list them down here:

Question 1 [Solved]

Is it true that a Python interpreter uses only one CPU core at a time to run all the threads? In other words, will the Python interpreter session 1 (from the figure) run all 3 threads (Main_thread, TCP_thread and UDP_thread) on one CPU core?

Answer: yes, this is true. The GIL (Global Interpreter Lock) ensures that all threads run on one CPU core at a time.

Question 2 [Not yet solved]

Do I have a way to track which CPU core it is?

Question 3 [Partly solved]

For this question we forget about threads, but we focus on the subprocess mechanism in Python. Starting a new subprocess implies starting up a new Python interpreter instance. Is this correct?

Answer: Yes this is correct. At first there was some confusion about whether the following code would create a new Python interpreter instance:

    p = subprocess.Popen(['python', mySubprocessPath], shell = True)

The issue has been clarified. This code indeed starts a new Python interpreter instance.

Will Python be smart enough to make that separate Python interpreter instance run on a different CPU core? Is there a way to track which one, perhaps with some sporadic print statements as well?

Question 4 [New question]

The community discussion raised a new question. There are apparently two approaches when spawning a new process (within a new Python interpreter instance):

    # Approach 1(a)
    p = subprocess.Popen(['python', mySubprocessPath], shell = True)

    # Approach 1(b) (J.F. Sebastian)
    p = subprocess.Popen([sys.executable, mySubprocessPath])

    # Approach 2
    p = multiprocessing.Process(target=foo, args=(q,))

The second approach has the obvious downside that it targets just a function - whereas I need to open up a new Python script. Anyway, are both approaches similar in what they achieve?

204

asked Apr 22 '16 13:04

K.Mulier

3 Answers

Q: Is it true that a Python interpreter uses only one CPU core at a time to run all the threads?

No. GIL and CPU affinity are unrelated concepts. GIL can be released during blocking I/O operations, long CPU intensive computations inside a C extension anyway.

If a thread is blocked on GIL; it is probably not on any CPU core and therefore it is fair to say that pure Python multithreading code may use only one CPU core at a time on CPython implementation.

Q: In other words, will the Python interpreter session 1 (from the figure) run all 3 threads (Main_thread, TCP_thread and UDP_thread) on one CPU core?

I don't think CPython manages CPU affinity implicitly. It is likely relies on OS scheduler to choose where to run a thread. Python threads are implemented on top of real OS threads.

Q: Or is the Python interpreter able to spread them over multiple cores?

To find out the number of usable CPUs:

>>> import os
>>> len(os.sched_getaffinity(0))
16

Again, whether or not threads are scheduled on different CPUs does not depend on Python interpreter.

Q: Suppose that the answer to Question 1 is 'multiple cores', do I have a way to track on which core each thread is running, perhaps with some sporadic print statements? If the answer to Question 1 is 'only one core', do I have a way to track which one it is?

I imagine, a specific CPU may change from one time-slot to another. You could look at something like /proc/<pid>/task/<tid>/status on old Linux kernels. On my machine, task_cpu can be read from /proc/<pid>/stat or /proc/<pid>/task/<tid>/stat:

>>> open("/proc/{pid}/stat".format(pid=os.getpid()), 'rb').read().split()[-14]
'4'

For a current portable solution, see whether psutil exposes such info.

You could restrict the current process to a set of CPUs:

os.sched_setaffinity(0, {0}) # current process on 0-th core

Q: For this question we forget about threads, but we focus on the subprocess mechanism in Python. Starting a new subprocess implies starting up a new Python interpreter session/shell. Is this correct?

Yes. subprocess module creates new OS processes. If you run python executable then it starts a new Python interpeter. If you run a bash script then no new Python interpreter is created i.e., running bash executable does not start a new Python interpreter/session/etc.

Q: Supposing that it is correct, will Python be smart enough to make that separate interpreter session run on a different CPU core? Is there a way to track this, perhaps with some sporadic print statements as well?

See above (i.e., OS decides where to run your thread and there could be OS API that exposes where the thread is run).

multiprocessing.Process(target=foo, args=(q,)).start()

multiprocessing.Process also creates a new OS process (that runs a new Python interpreter).

In reality, my subprocess is another file. So this example won't work for me.

Python uses modules to organize the code. If your code is in another_file.py then import another_file in your main module and pass another_file.foo to multiprocessing.Process.

Nevertheless, how would you compare it to p = subprocess.Popen(..)? Does it matter if I start the new process (or should I say 'python interpreter instance') with subprocess.Popen(..)versus multiprocessing.Process(..)?

multiprocessing.Process() is likely implemented on top of subprocess.Popen(). multiprocessing provides API that is similar to threading API and it abstracts away details of communication between python processes (how Python objects are serialized to be sent between processes).

If there are no CPU intensive tasks then you could run your GUI and I/O threads in a single process. If you have a series of CPU intensive tasks then to utilize multiple CPUs at once, either use multiple threads with C extensions such as lxml, regex, numpy (or your own one created using Cython) that can release GIL during long computations or offload them into separate processes (a simple way is to use a process pool such as provided by concurrent.futures).

Q: The community discussion raised a new question. There are apparently two approaches when spawning a new process (within a new Python interpreter instance):
# Approach 1(a)
p = subprocess.Popen(['python', mySubprocessPath], shell = True)

# Approach 1(b) (J.F. Sebastian)
p = subprocess.Popen([sys.executable, mySubprocessPath])

# Approach 2
p = multiprocessing.Process(target=foo, args=(q,))

"Approach 1(a)" is wrong on POSIX (though it may work on Windows). For portability, use "Approach 1(b)" unless you know you need cmd.exe (pass a string in this case, to make sure that the correct command-line escaping is used).

The second approach has the obvious downside that it targets just a function - whereas I need to open up a new Python script. Anyway, are both approaches similar in what they achieve?

subprocess creates new processes, any processes e.g., you could run a bash script. multprocessing is used to run Python code in another process. It is more flexible to import a Python module and run its function than to run it as a script. See Call python script with input with in a python script using subprocess.

115

answered Oct 09 '22 10:10

jfs

Since you are using the threading module which is build up on thread. As the documentation suggests, it uses the ''POSIX thread implementation'' pthread of your OS.

The threads are managed by the OS instead of Python interpreter. So the answer will depend on the pthread library in your system. However, CPython uses GIL to prevent multiple threads from executing Python bytecodes simutanously. So they will be sequentialized. But still they can be separated to different cores, which depends on your pthread libs.
Simplly use a debugger and attach it to your python.exe. For example the GDB thread command.
Similar to question 1, the new process is managed by your OS and probably running on a different core. Use debugger or any process monitor to see it. For more details, go to the CreatProcess() documentation page.

answered Oct 09 '22 09:10

gdlmx

1, 2: You have three real threads, but in CPython they're limited by GIL , so, assuming they're running pure python, code you'll see CPU usage as if only one core used.

3: As said gdlmx it's up to OS to choose a core to run a thread on, but if you really need control, you can set process or thread affinity using native API via ctypes. Since you are on Windows, it would be like this:

# This will run your subprocess on core#0 only
p = subprocess.Popen(['python', mySubprocessPath], shell = True)
cpu_mask = 1
ctypes.windll.kernel32.SetProcessAffinityMask(p._handle, cpu_mask)

I use here private Popen._handle for simplicty. The clean way would beOpenProcess(p.tid) etc.

And yes, subprocess runs python like everything else in another new process.

answered Oct 09 '22 09:10

robyschek

Related questions
                            
                                Type checking of arguments Python [duplicate]
                            
                                py.test: how to get the current test's name from the setup method?
                            
                                Printing an int list in a single line python3
                            
                                XlsxWriter object save as http response to create download in Django
                            
                                "Python version 2.7 required, which was not found in the registry" error when attempting to install netCDF4 on Windows 8
                            
                                ValueError: dict contains fields not in fieldnames
                            
                                Convert spreadsheet number to column letter
                            
                                No module named 'polls.apps.PollsConfigdjango'; Django project tutorial 2
                            
                                Reconnecting remote Jupyter Notebook and get current cell output
                            
                                Background Worker with Flask
                            
                                Is it possible to display an OpenCV video inside the IPython /JuPyter Notebook?
                            
                                Sharing Memory in Gunicorn?
                            
                                Why do I get SQLAlchemy nested rollback error?
                            
                                Sending a C++ array to Python and back (Extending C++ with Numpy)
                            
                                Why can't I handle a KeyboardInterrupt in python?
                            
                                Why is Jython much slower than CPython, despite the JVM's advances?
                            
                                Difference between nonzero(a), where(a) and argwhere(a). When to use which?
                            
                                How do you organise a python project that contains multiple packages so that each file in a package can still be run individually?
                            
                                What path to install Python 3.6 to on Windows?
                            
                                What is the effect of "list=list" in Python modules?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With