Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to determine concurrency (threads) while using shoryuken for background jobs?

In my Ruby on Rails application, I'm using shoryouken for background processing. I've many sqs queues (6-7) in my application. One of the queue has 2000-3000 jobs and it takes around 3 hours for the worker to process these 2-3k jobs with a default concurrency of 25. So based on what factors can we decide to increase the concurrency (which is the number of threads to process jobs). Please do comment if anything is unclear in the question.

like image 886
TarunJadhwani Avatar asked Feb 08 '17 13:02

TarunJadhwani


2 Answers

Concurrency defaults to 25, but can be changed by altering your shoryuken.yml configuration (see below) or by adding the concurrency argument as so: shoryuken -c {desiredCount}

concurrency: 25  # Update with your desired value.
delay: 25        # The delay in seconds to pause a queue when it's empty. Default 0
queues:
  - [high_priority, 6]
  - [default, 2]
  - [low_priority, 1]

You will need to test the optimal value for performance as you'll run into I/O and CPU bottlenecks as number of concurrent threads rises. Once you've reached the optimal value for your instance(s), you'll need to either increase the number of instances running this job or upgrade the instance(s).

If the bottleneck exists instead on your DB or other resource, you'll need to adjust it accordingly. (Not likely to be the case, but included for thoroughness' sake)

EDIT: Optimizing Performance

In response to your question on optimizing the thread count, the quickest/best way to determine the optimal concurrency value is to change concurrency and measure real-world throughput. There's other approaches, but the golden rule for performance is always to measure in a live production environment. Synthetic benchmarks are only helpful to the extent that they mirror real-time performance. (See also: premature optimization).

This is a case where you can easily end up overthinking things (then again, overthinking things is a perennial problem in development). Just measure with the appropriate metrics (CPU utilization, memory utilization, number of jobs completed per minute), and change the number of threads until you either maximize throughput or run into a bottleneck.

If your tasks are CPU bound you'll see your CPU utilization maxing out. If your tasks are I/O bound you'll see that after some point an increase in concurrent threads does not translate to an increase in throughput even though your CPU utilization fails to rise.

An I/O bottleneck can happen when any of the resources you're reading/writing are unable to keep up with your CPU demands. This includes system resources (memory, disk space), your database performance (DB CPU utilization, read/write limits), as well as other APIs you're connecting with. Network capacity is also a theoretical bottleneck but if it was you'd be big enough to have hired someone with experience in this area. Because there's so many different ways for this to happen, the only real way to figure it out what the bottlenecks are is to have your monitoring in place.

Re: formula, the short answer is that there's no one formula that you can use in this case. The long answer is probably yes, but you'd arrive at the optimum value in the course of collecting all the values you'd need to calculate it.

EDIT 2 : Concurrency, Latency, and Throughput

I realized I forgot to add one more piece of advice. When you're working with background tasks that users are not waiting for, your throughput (jobs per unit of time) is the only thing you want to optimize. Do not optimize for individual job time. It also means you cannot profile the current (and presumably un-bound) performance and get useful data because bottlenecks/constraints are target dependent. The constraints that exist for throughput will NOT be the same as the constraints that exist for individual task time.

(Technically speaking, your concurrency setting is your current constraint)

like image 108
Rich Seviora Avatar answered Oct 20 '22 15:10

Rich Seviora


Three main factors are

  1. Number of Cores
  2. Type of Job - I/O or CPU bound
  3. Is there another application or process running on server

Ideally for a cpu bound task keep number of thread to number of cpu cores.

For I/O bound task it requires benchmarking and calculating wait time for an I/O, and then you can decide the optimal value. For rough estimate if you have 4 cores than for I/O bound task you must keep at max 8 threads.

If you have your rails app running on the same then you will need to reduce number of cores.

Increasing the number of cores will not increase your performance if your system doesnt support.

Refer : http://baddotrobot.com/blog/2013/06/01/optimum-number-of-threads/

like image 1
Keval Gohil Avatar answered Oct 20 '22 15:10

Keval Gohil