Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python 3.13 with free-thread is slow

I was trying this new free-thread version of the interpreter, but find out that it actually takes longer than the GIL enabled version. I did observe that the usage on the CPU increase a lot for the free-thread interpreter, is there something I misunderstand about this new interpreter?

Version downloaded: python-3.13.0rc2-amd64

Code:

from concurrent.futures import ThreadPoolExecutor
from random import randint

import time


def create_table(size):
    a, b = size
    table = []
    for i in range(0, a):
        row = []
        for j in range(0, b):
            row.append(randint(0, 100))
        table.append(row)
    return table


if __name__ == "__main__":
    start = time.perf_counter()
    with ThreadPoolExecutor(4) as pool:
        result = pool.map(create_table, [(1000, 10000) for _ in range(10)])
    end = time.perf_counter()
    print(end - start, *[len(each) for each in result])

python3.13t takes 56sec
python3.13 takes 26sec
python3.12 takes 25sec

my benchmark

like image 910
Boyang Li Avatar asked Sep 14 '25 06:09

Boyang Li


1 Answers

The primary culprit appears to be the randint module, as it is a static import and appears to share a mutex between threads. Another problem is that you're only able to process 4 tables at a time. Since you want to create 10 tables in total, you'll be running batches of 4-4-2.

Here is the code with the randint problem addressed by replacing it with a SystemRandom instance per thread:

from concurrent.futures import ThreadPoolExecutor
from random import SystemRandom

import time


def create_table(size):
    a, b = size
    table = []
    random = SystemRandom()
    for i in range(0, a):
        row = []
        for j in range(0, b):
            row.append(random.randint(0, 100))
        table.append(row)
    return table


if __name__ == "__main__":
    start = time.perf_counter()
    with ThreadPoolExecutor(4) as pool:
        result = pool.map(create_table, [(1000, 10000) for _ in range(10)])
    end = time.perf_counter()
    print(end - start, *[len(each) for each in result])

And here is some code that achieves the same thing, but is more flexible with the thread creation and avoids unnecessary inter-thread communication:

import threading
from random import SystemRandom

import time


def create_table(obj, result: list[list[int]]):
    a, b = obj
    print(f"Starting thread {threading.current_thread().name}")
    random = SystemRandom()
    result[:] = [[random.randint(0, 100) for j in range(b)] for i in range(a)]
    print(f"Finished thread {threading.current_thread().name}")


if __name__ == "__main__":
    start = time.perf_counter()
    obj = (1000, 10000)
    results: list[list[list[int]]] = []
    threads: list[threading.Thread] = []
    for _ in range(4):
        result: list[list[int]] = []
        thread = threading.Thread(target=create_table, args=(obj, result))
        thread.start()
        threads.append(thread)
        results.append(result)
    for thread in threads:
        thread.join()
    print([len(r) for r in results])
    end = time.perf_counter()
    print(end - start)
like image 66
Matthew Muller Avatar answered Sep 15 '25 19:09

Matthew Muller