My Rails 4.1 application uses Sidekiq to run measurements in Celluloid actors. It continously runs out of database connections. It appears that Sidekiq opens 2 connections per job, and old connections never get closed properly.
MeasurementWorker
. By calling MeasurementWorker.perform_async(measurement.id)
. It does this: class MeasurementWorker
include Sidekiq::Worker
sidekiq_options retry: false, backtrace: true
def perform(measurement_id, force = false)
ActiveRecord::Base.connection_pool.with_connection do
Measurement.find(measurement_id).run
end
end
end
.run
, it does this: # various checks if measurement can be run at all, using AR
# ...
begin
ActiveRecord::Base.connection_pool.with_connection do
# (I used to have a Timeout.timeout here, but removed it for the
# sake of simplification)
@connection = MeasurementConnection.new do |event_info|
event = Event.new
event.assign_attributes(event_info)
event.save
end
while @connection.measuring?; end
end # with_connection
rescue Exception => e
# an exception happened, e.g. something during the measurement itself
# log error (left out here for brevity)
else
# all went fine,
# save this measurement via AR
ensure
# Close and terminate the actor to free up the websocket,
# if it is still actively measuring something.
if @connection
if @connection.alive? and @connection.measuring?
@connection.close
end
while @connection.alive?
@connection.terminate
sleep(0.01)
end
end
end
MeasurementConnection
is a simple Celluloid actor. There is no AR-related code inside this actor. Sidekiq.configure_server do |config|
if defined?(ActiveRecord::Base)
config = Rails.application.config.database_configuration[Rails.env]
config['pool'] = Sidekiq.options[:concurrency] + 2
config['reaping_frequency'] = ENV['DB_REAP_FREQ'] || 5
ActiveRecord::Base.establish_connection(config)
end
end
database.yml
: pool set to 60, reaping frequency 5When I check SELECT * FROM pg_stat_activity;
, I see that there are some old connections open with no Sidekiq workers busy, and some new where they are:
16661 measurement 6354 16384 measurement unicorn worker[0] -c /home/web...E production -D -l0.0.0.0:8080 127.0.0.1 50775 2015-02-20 12:52:48.572551+01 2015-02-20 13:05:05.773178+01 2015-02-20 13:05:05.773565+01 f idle SELECT COUNT(*) FROM "measurements"
16661 measurement 6406 16384 measurement unicorn worker[2] -c /home/web...E production -D -l0.0.0.0:8080 127.0.0.1 50776 2015-02-20 12:53:59.636414+01 2015-02-20 13:04:53.930305+01 2015-02-20 13:04:53.931+01 f idle SELECT COUNT(*) FROM "measurements"
16661 measurement 6687 16384 measurement sidekiq 3.3.0 measurement [0 of 50 busy] 127.0.0.1 50801 2015-02-20 13:00:05.14621+01 2015-02-20 13:04:49.558589+01 2015-02-20 13:04:49.558835+01 f idle COMMIT
16661 measurement 7042 16384 measurement unicorn worker[1] -c /home/web...E production -D -l0.0.0.0:8080 127.0.0.1 50997 2015-02-20 13:00:34.874675+01 2015-02-20 13:00:35.376593+01 2015-02-20 13:00:35.376979+01 f idle SELECT COUNT(*) FROM "measurements"
16661 measurement 6668 16384 measurement sidekiq 3.3.0 measurement [0 of 50 busy] 127.0.0.1 50781 2015-02-20 13:00:04.883553+01 2015-02-20 13:04:19.108365+01 2015-02-20 13:04:19.108567+01 f idle COMMIT
16661 measurement 6669 16384 measurement sidekiq 3.3.0 measurement [0 of 50 busy] 127.0.0.1 50782 2015-02-20 13:00:04.908349+01 2015-02-20 13:03:57.683036+01 2015-02-20 13:03:57.683236+01 f idle COMMIT
16661 measurement 6672 16384 measurement sidekiq 3.3.0 measurement [0 of 50 busy] 127.0.0.1 50786 2015-02-20 13:00:04.962251+01 2015-02-20 13:04:32.395137+01 2015-02-20 13:04:32.395344+01 f idle COMMIT
16661 measurement 6674 16384 measurement sidekiq 3.3.0 measurement [0 of 50 busy] 127.0.0.1 50788 2015-02-20 13:00:04.98456+01 2015-02-20 13:04:32.396335+01 2015-02-20 13:04:32.39652+01 f idle COMMIT
16661 measurement 6676 16384 measurement sidekiq 3.3.0 measurement [0 of 50 busy] 127.0.0.1 50790 2015-02-20 13:00:05.006847+01 2015-02-20 13:04:19.059628+01 2015-02-20 13:04:19.059831+01 f idle COMMIT
16661 measurement 6678 16384 measurement sidekiq 3.3.0 measurement [0 of 50 busy] 127.0.0.1 50792 2015-02-20 13:00:05.029448+01 2015-02-20 13:04:23.730293+01 2015-02-20 13:04:23.730523+01 f idle COMMIT
16661 measurement 6680 16384 measurement sidekiq 3.3.0 measurement [0 of 50 busy] 127.0.0.1 50794 2015-02-20 13:00:05.051932+01 2015-02-20 13:04:49.557435+01 2015-02-20 13:04:49.557633+01 f idle COMMIT
16661 measurement 6684 16384 measurement sidekiq 3.3.0 measurement [0 of 50 busy] 127.0.0.1 50798 2015-02-20 13:00:05.124225+01 2015-02-20 13:03:51.693799+01 2015-02-20 13:03:51.694034+01 f idle COMMIT
16661 measurement 6690 16384 measurement sidekiq 3.3.0 measurement [0 of 50 busy] 127.0.0.1 50804 2015-02-20 13:00:05.168099+01 2015-02-20 13:04:54.849239+01 2015-02-20 13:04:54.849459+01 f idle COMMIT
16661 measurement 6693 16384 measurement sidekiq 3.3.0 measurement [0 of 50 busy] 127.0.0.1 50807 2015-02-20 13:00:05.189661+01 2015-02-20 13:04:18.688459+01 2015-02-20 13:04:18.688732+01 f idle COMMIT
16661 measurement 6696 16384 measurement sidekiq 3.3.0 measurement [0 of 50 busy] 127.0.0.1 50810 2015-02-20 13:00:05.210659+01 2015-02-20 13:03:57.68424+01 2015-02-20 13:03:57.684483+01 f idle COMMIT
16661 measurement 6699 16384 measurement sidekiq 3.3.0 measurement [0 of 50 busy] 127.0.0.1 50813 2015-02-20 13:00:05.231641+01 2015-02-20 13:04:04.962397+01 2015-02-20 13:04:04.96258+01 f idle COMMIT
16661 measurement 6701 16384 measurement sidekiq 3.3.0 measurement [0 of 50 busy] 127.0.0.1 50815 2015-02-20 13:00:05.252357+01 2015-02-20 13:04:41.685372+01 2015-02-20 13:04:41.685594+01 f idle COMMIT
16661 measurement 6706 16384 measurement sidekiq 3.3.0 measurement [0 of 50 busy] 127.0.0.1 50820 2015-02-20 13:00:05.273301+01 2015-02-20 13:04:23.733488+01 2015-02-20 13:04:23.733681+01 f idle COMMIT
16661 measurement 7003 16384 measurement sidekiq 3.3.0 measurement [15 of 50 busy] 127.0.0.1 50967 2015-02-20 13:00:09.004487+01 2015-02-20 13:02:02.036429+01 2015-02-20 13:02:02.036696+01 f idle COMMIT
16661 measurement 7005 16384 measurement sidekiq 3.3.0 measurement [15 of 50 busy] 127.0.0.1 50969 2015-02-20 13:00:11.118961+01 2015-02-20 13:02:48.341078+01 2015-02-20 13:02:48.341294+01 f idle COMMIT
16661 measurement 7006 16384 measurement sidekiq 3.3.0 measurement [15 of 50 busy] 127.0.0.1 50970 2015-02-20 13:00:12.245408+01 2015-02-20 13:03:04.300372+01 2015-02-20 13:03:04.300575+01 f idle COMMIT
16661 measurement 7007 16384 measurement sidekiq 3.3.0 measurement [15 of 50 busy] 127.0.0.1 50971 2015-02-20 13:00:12.648636+01 2015-02-20 13:03:01.855616+01 2015-02-20 13:03:01.85588+01 f idle COMMIT
16661 measurement 7008 16384 measurement sidekiq 3.3.0 measurement [15 of 50 busy] 127.0.0.1 50972 2015-02-20 13:00:12.956139+01 2015-02-20 13:03:13.840023+01 2015-02-20 13:03:13.840466+01 f idle COMMIT
16661 measurement 7009 16384 measurement sidekiq 3.3.0 measurement [15 of 50 busy] 127.0.0.1 50973 2015-02-20 13:00:13.02424+01 2015-02-20 13:02:50.115996+01 2015-02-20 13:02:50.116259+01 f idle COMMIT
16661 measurement 7010 16384 measurement sidekiq 3.3.0 measurement [15 of 50 busy] 127.0.0.1 50974 2015-02-20 13:00:13.0909+01 2015-02-20 13:03:09.968+01 2015-02-20 13:03:09.968284+01 f idle COMMIT
16661 measurement 7014 16384 measurement sidekiq 3.3.0 measurement [15 of 50 busy] 127.0.0.1 50976 2015-02-20 13:00:14.929822+01 2015-02-20 13:03:20.183195+01 2015-02-20 13:03:20.183467+01 f idle COMMIT
16661 measurement 7020 16384 measurement sidekiq 3.3.0 measurement [15 of 50 busy] 127.0.0.1 50980 2015-02-20 13:00:22.498892+01 2015-02-20 13:03:29.887257+01 2015-02-20 13:03:29.887599+01 f idle COMMIT
16661 measurement 7021 16384 measurement sidekiq 3.3.0 measurement [15 of 50 busy] 127.0.0.1 50981 2015-02-20 13:00:22.898087+01 2015-02-20 13:03:39.689939+01 2015-02-20 13:03:39.69798+01 f idle COMMIT
16661 measurement 7022 16384 measurement sidekiq 3.3.0 measurement [15 of 50 busy] 127.0.0.1 50982 2015-02-20 13:00:23.215846+01 2015-02-20 13:03:03.918339+01 2015-02-20 13:03:03.918613+01 f idle COMMIT
16661 measurement 7023 16384 measurement sidekiq 3.3.0 measurement [15 of 50 busy] 127.0.0.1 50983 2015-02-20 13:00:23.930861+01 2015-02-20 13:03:51.504525+01 2015-02-20 13:03:51.512786+01 f idle COMMIT
16661 measurement 7025 16384 measurement sidekiq 3.3.0 measurement [15 of 50 busy] 127.0.0.1 50985 2015-02-20 13:00:24.409999+01 2015-02-20 13:03:16.000375+01 2015-02-20 13:03:16.006178+01 f idle COMMIT
16661 measurement 7027 16384 measurement sidekiq 3.3.0 measurement [15 of 50 busy] 127.0.0.1 50986 2015-02-20 13:00:25.786321+01 2015-02-20 13:03:22.631091+01 2015-02-20 13:03:22.631353+01 f idle COMMIT
16661 measurement 7045 16384 measurement sidekiq 3.3.0 measurement [15 of 50 busy] 127.0.0.1 50998 2015-02-20 13:00:48.004036+01 2015-02-20 13:03:39.717124+01 2015-02-20 13:03:39.722956+01 f idle COMMIT
In total, there are 34 connections, but I only ran 15 measurements.
In sidekiq.log
, they all show as done:
2015-02-20T12:00:04.879Z 6235 TID-osgop8md0 MeasurementWorker JID-1cdcd44bf41fefe9ddca21ac INFO: start
2015-02-20T12:00:04.907Z 6235 TID-osgoox8hg MeasurementWorker JID-79f4d1ff6692248682ba93dd INFO: start
2015-02-20T12:00:04.939Z 6235 TID-osgoos38k MeasurementWorker JID-09f95fbccd2438d17916d425 INFO: start
2015-02-20T12:00:04.940Z 6235 TID-osgoorlmo MeasurementWorker JID-be1a57871f26146e9884107e INFO: start
2015-02-20T12:00:04.959Z 6235 TID-osgoow16k MeasurementWorker JID-e296efec897c23629b96e99f INFO: start
2015-02-20T12:00:04.968Z 6235 TID-osgoouytg MeasurementWorker JID-c6a57700872b7fe427e33664 INFO: start
2015-02-20T12:00:04.984Z 6235 TID-osgooz63k MeasurementWorker JID-f7448eaffe109882130497ca INFO: start
2015-02-20T12:00:04.998Z 6235 TID-osgoozvzs MeasurementWorker JID-c55c04f3424268fba50ec048 INFO: start
2015-02-20T12:00:05.014Z 6235 TID-osgooyr6c MeasurementWorker JID-01bd303e953fd2998fe3f8d1 INFO: start
2015-02-20T12:00:05.030Z 6235 TID-osgon0ums MeasurementWorker JID-6949c5c81b4c254046f0c585 INFO: start
2015-02-20T12:00:05.042Z 6235 TID-osgomw0g8 MeasurementWorker JID-cc03b717f81dd6fb0f58a946 INFO: start
2015-02-20T12:00:05.111Z 6235 TID-osgomlno8 MeasurementWorker JID-10eebcd76113f3565d8265ca INFO: start
2015-02-20T12:00:05.113Z 6235 TID-osgomjzzw MeasurementWorker JID-b0536d9a029faed0ba8eb5d3 INFO: start
2015-02-20T12:00:05.114Z 6235 TID-osgomhyms MeasurementWorker JID-1f6624314afd8e0ae611599f INFO: start
2015-02-20T12:00:05.115Z 6235 TID-osgolgzoc MeasurementWorker JID-24e87960d7e7fbd871037dd3 INFO: start
2015-02-20T12:02:02.333Z 6235 TID-osgomw0g8 MeasurementWorker JID-cc03b717f81dd6fb0f58a946 INFO: done: 117.291 sec
2015-02-20T12:02:48.983Z 6235 TID-osgoox8hg MeasurementWorker JID-79f4d1ff6692248682ba93dd INFO: done: 164.077 sec
2015-02-20T12:02:50.688Z 6235 TID-osgomlno8 MeasurementWorker JID-10eebcd76113f3565d8265ca INFO: done: 165.577 sec
2015-02-20T12:03:02.429Z 6235 TID-osgolgzoc MeasurementWorker JID-24e87960d7e7fbd871037dd3 INFO: done: 177.314 sec
2015-02-20T12:03:04.581Z 6235 TID-osgoorlmo MeasurementWorker JID-be1a57871f26146e9884107e INFO: done: 179.641 sec
2015-02-20T12:03:09.453Z 6235 TID-osgop8md0 MeasurementWorker JID-1cdcd44bf41fefe9ddca21ac INFO: done: 184.573 sec
2015-02-20T12:03:10.362Z 6235 TID-osgoouytg MeasurementWorker JID-c6a57700872b7fe427e33664 INFO: done: 185.394 sec
2015-02-20T12:03:14.232Z 6235 TID-osgomjzzw MeasurementWorker JID-b0536d9a029faed0ba8eb5d3 INFO: done: 189.118 sec
2015-02-20T12:03:16.347Z 6235 TID-osgoos38k MeasurementWorker JID-09f95fbccd2438d17916d425 INFO: done: 191.408 sec
2015-02-20T12:03:20.398Z 6235 TID-osgoow16k MeasurementWorker JID-e296efec897c23629b96e99f INFO: done: 195.439 sec
2015-02-20T12:03:22.947Z 6235 TID-osgomhyms MeasurementWorker JID-1f6624314afd8e0ae611599f INFO: done: 197.833 sec
2015-02-20T12:03:30.212Z 6235 TID-osgooz63k MeasurementWorker JID-f7448eaffe109882130497ca INFO: done: 205.228 sec
2015-02-20T12:03:39.931Z 6235 TID-osgooyr6c MeasurementWorker JID-01bd303e953fd2998fe3f8d1 INFO: done: 214.918 sec
2015-02-20T12:03:39.936Z 6235 TID-osgon0ums MeasurementWorker JID-6949c5c81b4c254046f0c585 INFO: done: 214.906 sec
2015-02-20T12:03:51.694Z 6235 TID-osgoozvzs MeasurementWorker JID-c55c04f3424268fba50ec048 INFO: done: 226.696 sec
So, for every measurement, Sidekiq seems to open 2 connections, but it never closes them. What should I do?
Sidekiq will reuse threads for different jobs in some cases, so we must be very careful to cleanup our ProcessContext to ensure we do not pollute the context of other jobs.
At the same time, Sidekiq uses multithreading so it is much more memory-efficient than Rescue.
Sidekiq is an open-source framework that provides efficient background processing for Ruby applications. It uses Redis as an in-memory data structure to store all of its job and operational data. It's important to be aware that Sidekiq by default doesn't do scheduling, it only executes jobs.
Sidekiq server process pulls jobs from the queue in Redis and processes them. Like your web processes, Sidekiq boots Rails so your jobs and workers have the full Rails API, including Active Record, available for use. The server will instantiate the worker and call perform with the given arguments.
The issue is that your celluloid actor is leaking connections: when you do something requiring an active record connection in a new thread (or even finer apparently) AR automatically checks out a connection but won't check it back in automatically.
If you bracket your database accessing code with
ActiveRecord::Base.connection_pool.with_connection do
# do stuff
end
that should clear up any connections created within (as you do in your sidekiq worker).
Alternatively you could try explicitly checking out a connection
@connection = ActiveRecord::Base.connection_pool.checkout
and then checking it back in in your finalizer
ActiveRecord::Base.connection_pool.checkin @connection
If your actor is very long lived you might end up hogging the connection longer than you need it though (the flipside is that it will be inefficient to be checking a connection in and out in a tight loop)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With