Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Increase connect(2) timeout in RestClient / Net::HTTP on AWS Linux

I'm using rest-client to POST to a very slow web service. I'm setting timeout to 600 seconds, and I've confirmed that it's being passed down to Net::HTTP's @read_timeout and @open_timeout.

However, after about two minutes, I get a low-level timeout error, Errno::ETIMEDOUT: Connection timed out - connect(2):

The relevant part of the backtrace is

Operation timed out - connect(2) for [myhost] port [myport]
/Users/dmoles/.rvm/rubies/ruby-2.2.5/lib/ruby/2.2.0/net/http.rb:879:in `initialize'
/Users/dmoles/.rvm/rubies/ruby-2.2.5/lib/ruby/2.2.0/net/http.rb:879:in `open'
/Users/dmoles/.rvm/rubies/ruby-2.2.5/lib/ruby/2.2.0/net/http.rb:879:in `block in connect'
/Users/dmoles/.rvm/rubies/ruby-2.2.5/lib/ruby/2.2.0/timeout.rb:88:in `block in timeout'
/Users/dmoles/.rvm/rubies/ruby-2.2.5/lib/ruby/2.2.0/timeout.rb:98:in `call'
/Users/dmoles/.rvm/rubies/ruby-2.2.5/lib/ruby/2.2.0/timeout.rb:98:in `timeout'
/Users/dmoles/.rvm/rubies/ruby-2.2.5/lib/ruby/2.2.0/net/http.rb:878:in `connect'
/Users/dmoles/.rvm/rubies/ruby-2.2.5/lib/ruby/2.2.0/net/http.rb:863:in `do_start'
/Users/dmoles/.rvm/rubies/ruby-2.2.5/lib/ruby/2.2.0/net/http.rb:852:in `start'
/Users/dmoles/.rvm/gems/ruby-2.2.5/gems/rest-client-2.0.0/lib/restclient/request.rb:766:in `transmit'
/Users/dmoles/.rvm/gems/ruby-2.2.5/gems/rest-client-2.0.0/lib/restclient/request.rb:215:in `execute'
/Users/dmoles/.rvm/gems/ruby-2.2.5/gems/rest-client-2.0.0/lib/restclient/request.rb:52:in `execute'

It looks like the line of code throwing the error is

TCPSocket.open(conn_address, conn_port, @local_host, @local_port)

It seems as though the underlying connect(2) system call has a timeout of about two minutes, and the timeout parameters passed to Net::HTTP can only shorten that, not lengthen it. Is there a way to modify the socket parameters to set a longer timeout?

Edited to add: This only appears to be a problem on our AWS Linux servers -- on my MacOS development machine, the ten-minute timeout works. I assume the default connect() timeout is longer on MacOS/BSD, but I don't really know.

like image 806
David Moles Avatar asked Nov 15 '16 00:11

David Moles


3 Answers

First of all, you could just increase the tcp_syn_retries configuration updating the /proc/sys/net/ipv4/tcp_syn_retries file. Reference here.

If if doesn't work, I think you will need to activate the SO_KEEPALIVE or TCP_USER_TIMEOUT options. But probably there is no interface for that in rest-client.

So maybe you'll need to make a fork or create the Socket and Socket::Option by yourself.

Mike Perham wrote about it in his blog.

like image 102
André Guimarães Sakata Avatar answered Oct 05 '22 23:10

André Guimarães Sakata


Maybe you are gettin out of sockets. The sockets need some time before be available again, if you are opening to many connections in a short period of time, this may be the problem.

Check ulimit -n to check the maximum number of opened file descriptors. Remember that a socket is a file, you need to change that to allow to open more sockets. To change the maximum number of opened files do sudo ulimit -n 1000000.

For more information, check this.

like image 42
Andrés Avatar answered Oct 06 '22 00:10

Andrés


Unsure of a 2m limitation, but AWS NATs have a 350s timeout. We had this same issue with our sidekiq instances where even though we had http_read_timeout set to 15m (for a Lambda invocation), even though the lambda completed in less than 15m we still received this error.

To fix, we did two things:

  • set the tcp_keepalive_time setting to < 350s
  • set the SO_KEEPALIVE on all sockets to be enabled

For us, this was the AWS SDK using Net::HTTP, which does not set this option. Because we did not see a way to override the HTTP adapter for the AWS v3 SDK, we were relegated to this in an initializer:

module KeepAliveAwareNetHttp
  def on_connect
    @socket.io.setsockopt(Socket::SOL_SOCKET, Socket::SO_KEEPALIVE, true)
    super
  end
end

Net::HTTP.prepend(KeepAliveAwareNetHttp)

In order to verify this on your server (to see if there are any TCP sockets that have this set) you can run ss -te. If there is a socket that has this enabled, it'll look something like this:

ESTAB   0         0                171.190.0.6:53254        100.80.12.28:5432     timer:(keepalive,3min11sec,0) ino:113741 sk:90 <->

The time indicates how much time is remaining before it will send the next keep-alive packet.

like image 31
Levi Avatar answered Oct 05 '22 23:10

Levi