Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does threading a lot leads to thrashing?

Does threading a lot leads to thrashing if each new thread wants to access the memory (specifically the same database in my case) and perform read/write operations through out its lifetime?

I assume that this is true. If my assumption is true, then what is the best way to maximize the CPU utilization? And how can i determine that some specific number of threads will give good CPU utilization?

If my assumption is wrong, please do give proper illustrations to let me understand the scenario clearly.

like image 461
Shashank Avatar asked Jul 09 '15 06:07

Shashank


People also ask

What happens if you have too many threads?

The Case of Creating Too Many Threads. Our job will take longer to finish if we generate thousands of threads since we'll have to spend time switching between their contexts. Use the thread pool to complete our task rather than creating new threads manually so that the OS can balance the ideal number of threads.

What is thread thrashing?

Thrashing is when the page fault and swapping happens very frequently at a higher rate, and then the operating system has to spend more time swapping these pages. This state in the operating system is known as thrashing.

Is it better to have more threads?

The impact of threads and cores on performance A larger number of threads or cores means that more tasks can be completed at the same time, but this does not mean that those tasks will complete faster. The more threads and cores, the better! The more you have, the faster your computer will be.


2 Answers

Trashy code causes trashing. Not thread. All code is ran by some threads, even the main(). Temp objects are garbage collected the same way on any thread.

The subtle part is when each thread preloads its own objects to perform the work, which can duplicate a lot of same classes. It's usually a small sacrifice to make to get the power of concurrency. But it's not trash (no leak, no deterioration).

There is one exception: when some 3rd party code caches material in thread locals... You could end up caching the same stuff on each thread. Not really a leak, but not efficient.

Rule of thumb for number of threads? Depends on the task.

If the tasks are pure computation like math, then you should not exceed the number of non-hyperthreaded cores.

If the job is memory intensive along with pure computation work (most cases), then the number of hyperthreaded cores is your target (because the CPU will use the idle time of memory access for another core computations).

If the job is mostly large sequential disk i/o, then you number of threads should be not to much above the number of disk spindle available to read. This is VERY approximative since the disk caches, DMA, SSD, raids and such are completely affecting how the disk layer can service your thread without idling. When using random access, this is also valid. However, the virtualization these days will throw all your estimates out the window. Disk i/o could be much more available than you think, but also much worse.

If the jobs are mostly network i/o waits, then it is not really limited from your side; I would go with about 3x the number of cores to start. This multiplier is simply presuming that such thread wait on network for 2/3 of its time. Which is very low in practice. Could be 99% of its time waiting for nw i/o (100x). Which is why you see NIO sockets everywhere, to deal with many connections with fewer busier threads.

like image 75
user2023577 Avatar answered Nov 05 '22 07:11

user2023577


No, you could have 100's of idle threads waiting for work and not see any thrashing, which is caused by application working set size exceeding available memory size, so active pages need to be reloaded from disk (even written out to disk to when temporary variable storage needs saving to be relaoded later).

Threads share an address space, having many active leads to diminishing returns due to lock contention. So in the DB case, many processes reading tables can proceed simultaneously, yet updates of dependant data need to be serialised to keep data consistent which may cause lock contention and limit parallel processing.

Poorly written queries which need to load & sort large tables into memory, may cause thrashing when they exceed free RAM (perhaps poor choice of indexs). You can increase the query throughput, to utilise CPUs more, by having large RAM disk caches and using SSDs to reduce random data access times.

On memory intensive computations, cache sizes may become important, fewer threads whose data stays in cache and CPU pre-fetches minimise stalls, work better than threads competing to load their data from main memory.

like image 42
Rob11311 Avatar answered Nov 05 '22 06:11

Rob11311