Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does Sidekiq not close old connections?

My Rails 4.1 application uses Sidekiq to run measurements in Celluloid actors. It continously runs out of database connections. It appears that Sidekiq opens 2 connections per job, and old connections never get closed properly.

What the system does

  • Every 15 minutes, I start a MeasurementWorker. By calling MeasurementWorker.perform_async(measurement.id). It does this:
    class MeasurementWorker
      include Sidekiq::Worker
      sidekiq_options retry: false, backtrace: true

      def perform(measurement_id, force = false)
        ActiveRecord::Base.connection_pool.with_connection do
            Measurement.find(measurement_id).run
        end
      end
    end
  • Inside this measurement, when I call .run, it does this:
    # various checks if measurement can be run at all, using AR
    # ...
    begin
      ActiveRecord::Base.connection_pool.with_connection do
      # (I used to have a Timeout.timeout here, but removed it for the
      # sake of simplification)
        @connection = MeasurementConnection.new do |event_info|
          event = Event.new
          event.assign_attributes(event_info)
          event.save
        end
        while @connection.measuring?; end
      end # with_connection
    rescue Exception => e
      # an exception happened, e.g. something during the measurement itself
      # log error (left out here for brevity)
    else
      # all went fine, 
      # save this measurement via AR
    ensure
      # Close and terminate the actor to free up the websocket,
      # if it is still actively measuring something.
      if @connection
        if @connection.alive? and @connection.measuring?
          @connection.close
        end
        while @connection.alive?
          @connection.terminate
          sleep(0.01)
        end
      end
    end
  • The MeasurementConnection is a simple Celluloid actor. There is no AR-related code inside this actor.

Configuration

  • Unicorn: concurrency set to 3
  • Sidekiq 3.3.0: concurrency set to 50, and in the initializer:
    Sidekiq.configure_server do |config|
      if defined?(ActiveRecord::Base)
        config = Rails.application.config.database_configuration[Rails.env]
        config['pool']              = Sidekiq.options[:concurrency] + 2
        config['reaping_frequency'] = ENV['DB_REAP_FREQ'] || 5
        ActiveRecord::Base.establish_connection(config)
      end
    end
  • database.yml: pool set to 60, reaping frequency 5
  • PostgreSQL 9.3: maximum connections are 2000, no other modifications

The problem: too many connections open

When I check SELECT * FROM pg_stat_activity;, I see that there are some old connections open with no Sidekiq workers busy, and some new where they are:

16661 measurement 6354  16384 measurement unicorn worker[0] -c /home/web...E production -D -l0.0.0.0:8080 127.0.0.1   50775 2015-02-20 12:52:48.572551+01   2015-02-20 13:05:05.773178+01 2015-02-20 13:05:05.773565+01 f idle  SELECT COUNT(*) FROM "measurements"
16661 measurement 6406  16384 measurement unicorn worker[2] -c /home/web...E production -D -l0.0.0.0:8080 127.0.0.1   50776 2015-02-20 12:53:59.636414+01   2015-02-20 13:04:53.930305+01 2015-02-20 13:04:53.931+01  f idle  SELECT COUNT(*) FROM "measurements"
16661 measurement 6687  16384 measurement sidekiq 3.3.0 measurement [0 of 50 busy]  127.0.0.1   50801 2015-02-20 13:00:05.14621+01    2015-02-20 13:04:49.558589+01 2015-02-20 13:04:49.558835+01 f idle  COMMIT
16661 measurement 7042  16384 measurement unicorn worker[1] -c /home/web...E production -D -l0.0.0.0:8080 127.0.0.1   50997 2015-02-20 13:00:34.874675+01   2015-02-20 13:00:35.376593+01 2015-02-20 13:00:35.376979+01 f idle  SELECT COUNT(*) FROM "measurements"
16661 measurement 6668  16384 measurement sidekiq 3.3.0 measurement [0 of 50 busy]  127.0.0.1   50781 2015-02-20 13:00:04.883553+01   2015-02-20 13:04:19.108365+01 2015-02-20 13:04:19.108567+01 f idle  COMMIT
16661 measurement 6669  16384 measurement sidekiq 3.3.0 measurement [0 of 50 busy]  127.0.0.1   50782 2015-02-20 13:00:04.908349+01   2015-02-20 13:03:57.683036+01 2015-02-20 13:03:57.683236+01 f idle  COMMIT
16661 measurement 6672  16384 measurement sidekiq 3.3.0 measurement [0 of 50 busy]  127.0.0.1   50786 2015-02-20 13:00:04.962251+01   2015-02-20 13:04:32.395137+01 2015-02-20 13:04:32.395344+01 f idle  COMMIT
16661 measurement 6674  16384 measurement sidekiq 3.3.0 measurement [0 of 50 busy]  127.0.0.1   50788 2015-02-20 13:00:04.98456+01    2015-02-20 13:04:32.396335+01 2015-02-20 13:04:32.39652+01  f idle  COMMIT
16661 measurement 6676  16384 measurement sidekiq 3.3.0 measurement [0 of 50 busy]  127.0.0.1   50790 2015-02-20 13:00:05.006847+01   2015-02-20 13:04:19.059628+01 2015-02-20 13:04:19.059831+01 f idle  COMMIT
16661 measurement 6678  16384 measurement sidekiq 3.3.0 measurement [0 of 50 busy]  127.0.0.1   50792 2015-02-20 13:00:05.029448+01   2015-02-20 13:04:23.730293+01 2015-02-20 13:04:23.730523+01 f idle  COMMIT
16661 measurement 6680  16384 measurement sidekiq 3.3.0 measurement [0 of 50 busy]  127.0.0.1   50794 2015-02-20 13:00:05.051932+01   2015-02-20 13:04:49.557435+01 2015-02-20 13:04:49.557633+01 f idle  COMMIT
16661 measurement 6684  16384 measurement sidekiq 3.3.0 measurement [0 of 50 busy]  127.0.0.1   50798 2015-02-20 13:00:05.124225+01   2015-02-20 13:03:51.693799+01 2015-02-20 13:03:51.694034+01 f idle  COMMIT
16661 measurement 6690  16384 measurement sidekiq 3.3.0 measurement [0 of 50 busy]  127.0.0.1   50804 2015-02-20 13:00:05.168099+01   2015-02-20 13:04:54.849239+01 2015-02-20 13:04:54.849459+01 f idle  COMMIT
16661 measurement 6693  16384 measurement sidekiq 3.3.0 measurement [0 of 50 busy]  127.0.0.1   50807 2015-02-20 13:00:05.189661+01   2015-02-20 13:04:18.688459+01 2015-02-20 13:04:18.688732+01 f idle  COMMIT
16661 measurement 6696  16384 measurement sidekiq 3.3.0 measurement [0 of 50 busy]  127.0.0.1   50810 2015-02-20 13:00:05.210659+01   2015-02-20 13:03:57.68424+01  2015-02-20 13:03:57.684483+01 f idle  COMMIT
16661 measurement 6699  16384 measurement sidekiq 3.3.0 measurement [0 of 50 busy]  127.0.0.1   50813 2015-02-20 13:00:05.231641+01   2015-02-20 13:04:04.962397+01 2015-02-20 13:04:04.96258+01  f idle  COMMIT
16661 measurement 6701  16384 measurement sidekiq 3.3.0 measurement [0 of 50 busy]  127.0.0.1   50815 2015-02-20 13:00:05.252357+01   2015-02-20 13:04:41.685372+01 2015-02-20 13:04:41.685594+01 f idle  COMMIT
16661 measurement 6706  16384 measurement sidekiq 3.3.0 measurement [0 of 50 busy]  127.0.0.1   50820 2015-02-20 13:00:05.273301+01   2015-02-20 13:04:23.733488+01 2015-02-20 13:04:23.733681+01 f idle  COMMIT
16661 measurement 7003  16384 measurement sidekiq 3.3.0 measurement [15 of 50 busy] 127.0.0.1   50967 2015-02-20 13:00:09.004487+01   2015-02-20 13:02:02.036429+01 2015-02-20 13:02:02.036696+01 f idle  COMMIT
16661 measurement 7005  16384 measurement sidekiq 3.3.0 measurement [15 of 50 busy] 127.0.0.1   50969 2015-02-20 13:00:11.118961+01   2015-02-20 13:02:48.341078+01 2015-02-20 13:02:48.341294+01 f idle  COMMIT
16661 measurement 7006  16384 measurement sidekiq 3.3.0 measurement [15 of 50 busy] 127.0.0.1   50970 2015-02-20 13:00:12.245408+01   2015-02-20 13:03:04.300372+01 2015-02-20 13:03:04.300575+01 f idle  COMMIT
16661 measurement 7007  16384 measurement sidekiq 3.3.0 measurement [15 of 50 busy] 127.0.0.1   50971 2015-02-20 13:00:12.648636+01   2015-02-20 13:03:01.855616+01 2015-02-20 13:03:01.85588+01  f idle  COMMIT
16661 measurement 7008  16384 measurement sidekiq 3.3.0 measurement [15 of 50 busy] 127.0.0.1   50972 2015-02-20 13:00:12.956139+01   2015-02-20 13:03:13.840023+01 2015-02-20 13:03:13.840466+01 f idle  COMMIT
16661 measurement 7009  16384 measurement sidekiq 3.3.0 measurement [15 of 50 busy] 127.0.0.1   50973 2015-02-20 13:00:13.02424+01    2015-02-20 13:02:50.115996+01 2015-02-20 13:02:50.116259+01 f idle  COMMIT
16661 measurement 7010  16384 measurement sidekiq 3.3.0 measurement [15 of 50 busy] 127.0.0.1   50974 2015-02-20 13:00:13.0909+01   2015-02-20 13:03:09.968+01  2015-02-20 13:03:09.968284+01 f idle  COMMIT
16661 measurement 7014  16384 measurement sidekiq 3.3.0 measurement [15 of 50 busy] 127.0.0.1   50976 2015-02-20 13:00:14.929822+01   2015-02-20 13:03:20.183195+01 2015-02-20 13:03:20.183467+01 f idle  COMMIT
16661 measurement 7020  16384 measurement sidekiq 3.3.0 measurement [15 of 50 busy] 127.0.0.1   50980 2015-02-20 13:00:22.498892+01   2015-02-20 13:03:29.887257+01 2015-02-20 13:03:29.887599+01 f idle  COMMIT
16661 measurement 7021  16384 measurement sidekiq 3.3.0 measurement [15 of 50 busy] 127.0.0.1   50981 2015-02-20 13:00:22.898087+01   2015-02-20 13:03:39.689939+01 2015-02-20 13:03:39.69798+01  f idle  COMMIT
16661 measurement 7022  16384 measurement sidekiq 3.3.0 measurement [15 of 50 busy] 127.0.0.1   50982 2015-02-20 13:00:23.215846+01   2015-02-20 13:03:03.918339+01 2015-02-20 13:03:03.918613+01 f idle  COMMIT
16661 measurement 7023  16384 measurement sidekiq 3.3.0 measurement [15 of 50 busy] 127.0.0.1   50983 2015-02-20 13:00:23.930861+01   2015-02-20 13:03:51.504525+01 2015-02-20 13:03:51.512786+01 f idle  COMMIT
16661 measurement 7025  16384 measurement sidekiq 3.3.0 measurement [15 of 50 busy] 127.0.0.1   50985 2015-02-20 13:00:24.409999+01   2015-02-20 13:03:16.000375+01 2015-02-20 13:03:16.006178+01 f idle  COMMIT
16661 measurement 7027  16384 measurement sidekiq 3.3.0 measurement [15 of 50 busy] 127.0.0.1   50986 2015-02-20 13:00:25.786321+01   2015-02-20 13:03:22.631091+01 2015-02-20 13:03:22.631353+01 f idle  COMMIT
16661 measurement 7045  16384 measurement sidekiq 3.3.0 measurement [15 of 50 busy] 127.0.0.1   50998 2015-02-20 13:00:48.004036+01   2015-02-20 13:03:39.717124+01 2015-02-20 13:03:39.722956+01 f idle  COMMIT

In total, there are 34 connections, but I only ran 15 measurements.

In sidekiq.log, they all show as done:

2015-02-20T12:00:04.879Z 6235 TID-osgop8md0 MeasurementWorker JID-1cdcd44bf41fefe9ddca21ac INFO: start
2015-02-20T12:00:04.907Z 6235 TID-osgoox8hg MeasurementWorker JID-79f4d1ff6692248682ba93dd INFO: start
2015-02-20T12:00:04.939Z 6235 TID-osgoos38k MeasurementWorker JID-09f95fbccd2438d17916d425 INFO: start
2015-02-20T12:00:04.940Z 6235 TID-osgoorlmo MeasurementWorker JID-be1a57871f26146e9884107e INFO: start
2015-02-20T12:00:04.959Z 6235 TID-osgoow16k MeasurementWorker JID-e296efec897c23629b96e99f INFO: start
2015-02-20T12:00:04.968Z 6235 TID-osgoouytg MeasurementWorker JID-c6a57700872b7fe427e33664 INFO: start
2015-02-20T12:00:04.984Z 6235 TID-osgooz63k MeasurementWorker JID-f7448eaffe109882130497ca INFO: start
2015-02-20T12:00:04.998Z 6235 TID-osgoozvzs MeasurementWorker JID-c55c04f3424268fba50ec048 INFO: start
2015-02-20T12:00:05.014Z 6235 TID-osgooyr6c MeasurementWorker JID-01bd303e953fd2998fe3f8d1 INFO: start
2015-02-20T12:00:05.030Z 6235 TID-osgon0ums MeasurementWorker JID-6949c5c81b4c254046f0c585 INFO: start
2015-02-20T12:00:05.042Z 6235 TID-osgomw0g8 MeasurementWorker JID-cc03b717f81dd6fb0f58a946 INFO: start
2015-02-20T12:00:05.111Z 6235 TID-osgomlno8 MeasurementWorker JID-10eebcd76113f3565d8265ca INFO: start
2015-02-20T12:00:05.113Z 6235 TID-osgomjzzw MeasurementWorker JID-b0536d9a029faed0ba8eb5d3 INFO: start
2015-02-20T12:00:05.114Z 6235 TID-osgomhyms MeasurementWorker JID-1f6624314afd8e0ae611599f INFO: start
2015-02-20T12:00:05.115Z 6235 TID-osgolgzoc MeasurementWorker JID-24e87960d7e7fbd871037dd3 INFO: start
2015-02-20T12:02:02.333Z 6235 TID-osgomw0g8 MeasurementWorker JID-cc03b717f81dd6fb0f58a946 INFO: done: 117.291 sec
2015-02-20T12:02:48.983Z 6235 TID-osgoox8hg MeasurementWorker JID-79f4d1ff6692248682ba93dd INFO: done: 164.077 sec
2015-02-20T12:02:50.688Z 6235 TID-osgomlno8 MeasurementWorker JID-10eebcd76113f3565d8265ca INFO: done: 165.577 sec
2015-02-20T12:03:02.429Z 6235 TID-osgolgzoc MeasurementWorker JID-24e87960d7e7fbd871037dd3 INFO: done: 177.314 sec
2015-02-20T12:03:04.581Z 6235 TID-osgoorlmo MeasurementWorker JID-be1a57871f26146e9884107e INFO: done: 179.641 sec
2015-02-20T12:03:09.453Z 6235 TID-osgop8md0 MeasurementWorker JID-1cdcd44bf41fefe9ddca21ac INFO: done: 184.573 sec
2015-02-20T12:03:10.362Z 6235 TID-osgoouytg MeasurementWorker JID-c6a57700872b7fe427e33664 INFO: done: 185.394 sec
2015-02-20T12:03:14.232Z 6235 TID-osgomjzzw MeasurementWorker JID-b0536d9a029faed0ba8eb5d3 INFO: done: 189.118 sec
2015-02-20T12:03:16.347Z 6235 TID-osgoos38k MeasurementWorker JID-09f95fbccd2438d17916d425 INFO: done: 191.408 sec
2015-02-20T12:03:20.398Z 6235 TID-osgoow16k MeasurementWorker JID-e296efec897c23629b96e99f INFO: done: 195.439 sec
2015-02-20T12:03:22.947Z 6235 TID-osgomhyms MeasurementWorker JID-1f6624314afd8e0ae611599f INFO: done: 197.833 sec
2015-02-20T12:03:30.212Z 6235 TID-osgooz63k MeasurementWorker JID-f7448eaffe109882130497ca INFO: done: 205.228 sec
2015-02-20T12:03:39.931Z 6235 TID-osgooyr6c MeasurementWorker JID-01bd303e953fd2998fe3f8d1 INFO: done: 214.918 sec
2015-02-20T12:03:39.936Z 6235 TID-osgon0ums MeasurementWorker JID-6949c5c81b4c254046f0c585 INFO: done: 214.906 sec
2015-02-20T12:03:51.694Z 6235 TID-osgoozvzs MeasurementWorker JID-c55c04f3424268fba50ec048 INFO: done: 226.696 sec

So, for every measurement, Sidekiq seems to open 2 connections, but it never closes them. What should I do?

like image 871
slhck Avatar asked Feb 20 '15 12:02

slhck


People also ask

Does Sidekiq reuse threads?

Sidekiq will reuse threads for different jobs in some cases, so we must be very careful to cleanup our ProcessContext to ensure we do not pollute the context of other jobs.

Is Sidekiq multithreaded?

At the same time, Sidekiq uses multithreading so it is much more memory-efficient than Rescue.

How Sidekiq works or how would you implement Sidekiq?

Sidekiq is an open-source framework that provides efficient background processing for Ruby applications. It uses Redis as an in-memory data structure to store all of its job and operational data. It's important to be aware that Sidekiq by default doesn't do scheduling, it only executes jobs.

How does Sidekiq work in Rails?

Sidekiq server process pulls jobs from the queue in Redis and processes them. Like your web processes, Sidekiq boots Rails so your jobs and workers have the full Rails API, including Active Record, available for use. The server will instantiate the worker and call perform with the given arguments.


1 Answers

The issue is that your celluloid actor is leaking connections: when you do something requiring an active record connection in a new thread (or even finer apparently) AR automatically checks out a connection but won't check it back in automatically.

If you bracket your database accessing code with

ActiveRecord::Base.connection_pool.with_connection do
  # do stuff
end

that should clear up any connections created within (as you do in your sidekiq worker).

Alternatively you could try explicitly checking out a connection

@connection = ActiveRecord::Base.connection_pool.checkout

and then checking it back in in your finalizer

ActiveRecord::Base.connection_pool.checkin @connection

If your actor is very long lived you might end up hogging the connection longer than you need it though (the flipside is that it will be inefficient to be checking a connection in and out in a tight loop)

like image 66
Frederick Cheung Avatar answered Oct 01 '22 13:10

Frederick Cheung