Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

RabbitMQ on Azure connection timeout

Tags:

rabbitmq

azure

We have some problems porting our software on Azure. Our solution is composed by 2 websites (frontend, backend) and a webjob (a win service when installed on our hardware). These nodes communicate using a RabbitMQ cluster (2 Ubuntu VM). On premises we haven't any problems but when installed on Azure we see many errors like:

Publisher did not confirm message

or

Publish not confirmed before channel closed

or

SocketException: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond 104.40.186.27:5672

On RabbitMQ we see these kind of errors:

closing AMQP connection <0.390.0> (100.73.204.90:61152 -> 100.73.205.2:5672):
   {handshake_timeout,handshake}

The result is that often messages are not correctly received.

We use MassTransit on top of RabbitMQ for the actual messages exchange. Here our procedure to setup the environment:

We first create the 2 Ubuntu 14.04 virtual machines (A3: 4 cores, 7 GB) on the same cloud services. We create 2 public endpoints with a load balancer for port 5672 and 15672. Our clients are hosted inside Azure websites on the same region.

Here our powershel script to create the 2 VM:

$imageName = "b39f27a8b8c64d52b05eac6a62ebad85__Ubuntu-14_04_1-LTS-amd64-server-20140927-en-us-30GB"

$vmc = New-AzureVMConfig -Name $machineName -InstanceSize "Small" -Image $imageName -AvailabilitySetName $serviceName

$null = $vmc | Add-AzureProvisioningConfig -Linux -LinuxUser $user -Password $password
$null = $vmc | New-AzureVM -ServiceName $serviceName -WaitForBoot

$vm = Get-AzureVM -Name $machineName -ServiceName $serviceName

$null = Add-RabbitMQEndpoint -vm $vm -port 5672 -name "RabbitMQ-Main"
$null = Add-RabbitMQEndpoint -vm $vm -port 15672 -name "RabbitMQ-Mgmt"

$null = $vm | Update-AzureVM

Function Add-RabbitMQEndpoint($vm,$port,$name)
{
        $lbName = $name + "_LB"
        $null = Add-AzureEndpoint -VM $vm -LocalPort $port -PublicPort $port -Name $name -Protocol tcp -LBSetName $lbName -ProbePort $port -ProbeProtocol tcp -ProbeIntervalInSeconds 15
}

Then we run following script to install RabbitMQ on both machine:

  sudo add-apt-repository 'deb http://www.rabbitmq.com/debian/ testing main'"
  sudo apt-get update
  sudo apt-get -q -y --force-yes install rabbitmq-server=3.4.1-1

  sudo invoke-rc.d rabbitmq-server stop
  echo 'MYCOOKIEVALUE' | sudo tee /var/lib/rabbitmq/.erlang.cookie
  sudo chown rabbitmq:rabbitmq /var/lib/rabbitmq/.erlang.cookie
  sudo chmod 400 /var/lib/rabbitmq/.erlang.cookie
  sudo invoke-rc.d rabbitmq-server start

  sudo rabbitmq-plugins enable rabbitmq_management
  sudo invoke-rc.d rabbitmq-server stop
  sudo invoke-rc.d rabbitmq-server start

  sudo rabbitmqctl add_user user1 pwd1
  sudo rabbitmqctl set_user_tags user1 administrator
  sudo rabbitmqctl set_permissions -p / user1 '.*' '.*' '.*'

And then we create the cluster using:

  sudo rabbitmqctl stop_app
  sudo rabbitmqctl join_cluster rabbit@$mymachinename
  sudo rabbitmqctl start_app
  sudo rabbitmqctl set_cluster_name my_cluster_name

We have not opened any other port (like 4369 and 25672) because we suppose that these are only used for internal communication between nodes. It is right? We connect to rabbitmq from the client using the cloud service host name. We have also tried to remove the cluster and just connect to a single RabbitMQ VM.

Do you have any idea? Seems to be some kind of timeout problem? Can be a network partition problem?

like image 937
Davide Icardi Avatar asked Oct 19 '22 22:10

Davide Icardi


1 Answers

I was deploying a configuration with one VPS running the RabbitMQ broker on Windows Server. On the server we have two .Net services communicating over RabbitMQ/Masstransit and a website doing request/response to the services over RabbitMQ/Masstransit as well.

We would get spurious timeouts and RabbitMQ would fail on the respond most of the time. I have just finished moving the VPS and the website onto a virtual network (VNET) in Azure and this seems to solve the problem (fingers crossed). Beware that you have to update the broker address on the websites to the internal ip. The best way to make sure that connections happen through the VNET is to just close the endpoints for RabbitMQ. As an extra upside, with this setup, there is no worrying about transport safety as RabbitMQ will only be accessible inside the VNET.

like image 81
Philip Kaare Løventoft Avatar answered Oct 23 '22 01:10

Philip Kaare Løventoft