Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can't reconnect to Azure Redis via StackExchange.Redis

Caveat: Okay so this is a weird one, and i'm not sure if SO is the right place.

I have an Azure Website connecting to an Azure Redis Cache instance. (using StackExchange.Redis)

Everything was great, then one day - the website couln't connect to Redis.

Error:

It was not possible to connect to the redis server(s); to create a disconnected multiplexer, disable AbortOnConnectFail. SocketFailure on PING

Here's my connection string:

mycache.redis.cache.windows.net,ssl=true,password=xxxxxx,syncTimeout=5000

Here were my diagnosis steps:

  1. Try and connect from local to Azure Redis. Result: SUCCESS (so code is good?)
  2. Try and spinup NEW Azure Redis instance, connect from Azure. Result: FAIL (website can't connect to ANY azure Redis instance?)
  3. Spinup NEW Azure Website, with same code as erroring code, pointing to existing Redis cache. Result: SUCCESS (um, what?)
  4. File new MVC website, add StackExchange.Redis, deploy to new Azure Website, connecting to Redis. Result: SUCCESS (so Redis is good?)
  5. Deploy above vanilla MVC website to existing Azure Website (so same code as 4, connecting to same Redis, only difference is it's using the old Azure Website physical machine/networking). Result: FAIL (wtf??)

So - i'm thinking Redis has "blacklisted" the Azure website? (is that even possible?) I know that the client (my code) won't try and keep reconnecting, but i've bounced the site many times, and it just can't reconnect to Redis.

The fact that spinning up a new Azure Website, with the same code connecting to the same Redis instance results in success, tells me that some kind of blacklisting/routing issue has occured in Azure/Redis.

Any ideas?

EDIT

Looks like the problem is Azure VNET. When my website is part of the Azure Virtual Network, it can't connect to Redis. But when i take it out of the network, it connects fine. Before today, this setup was working fine.

So im wondering if Azure has made a change so that websites in a VNET cannot connect to Azure Redis? (makes no sense i know)

EDIT 2:

Attached is the logs from the Redis connection attempt.

Exception: It was not possible to connect to the redis server(s); to create a disconnected multiplexer, disable AbortOnConnectFail. SocketFailure on PING connection-string-removed:6380,password=password-removed,ssl=True Connecting connection-string-removed:6380/Interactive... BeginConnect: connection-string-removed:6380 1 unique nodes specified Requesting tie-break from connection-string-removed:6380

__Booksleeve_TieBreak... Allowing endpoints 00:00:05 to respond... Awaiting task completion, IOCP: (Busy=0,Free=1000,Min=1,Max=1000), WORKER: (Busy=4,Free=32763,Min=1,Max=32767) Not all tasks completed cleanly, IOCP: (Busy=0,Free=1000,Min=1,Max=1000), WORKER: (Busy=5,Free=32762,Min=1,Max=32767) connection-string-removed:6380 did not respond Awaiting task completion, IOCP: (Busy=0,Free=1000,Min=1,Max=1000), WORKER: (Busy=5,Free=32762,Min=1,Max=32767) Not all tasks completed cleanly, IOCP: (Busy=0,Free=1000,Min=1,Max=1000), WORKER: (Busy=5,Free=32762,Min=1,Max=32767) connection-string-removed:6380 failed to nominate (WaitingForActivation) No masters detected connection-string-removed:6380: Standalone v2.0.0, master; keep-alive: 00:01:00; int: Connecting; sub: Connecting; not in use: DidNotRespond connection-string-removed:6380: int ops=0, qu=2, qs=0, qc=0, wr=0, socks=1; sub ops=0, qu=0, qs=0, qc=0, wr=0, socks=1 Circular op-count snapshot; int: 0 (0.00 ops/s; spans 10s); sub: 0 (0.00 ops/s; spans 10s) Sync timeouts: 0; fire and forget: 0; last heartbeat: -1s ago resetting failing connections to retry... retrying; attempts left: 2... 1 unique nodes specified Requesting tie-break from connection-string-removed:6380 > __Booksleeve_TieBreak... Allowing endpoints 00:00:05 to respond... Awaiting task completion, IOCP: (Busy=0,Free=1000,Min=1,Max=1000), WORKER: (Busy=6,Free=32761,Min=1,Max=32767) Not all tasks completed cleanly, IOCP: (Busy=0,Free=1000,Min=1,Max=1000), WORKER: (Busy=7,Free=32760,Min=1,Max=32767) connection-string-removed:6380 did not respond Awaiting task completion, IOCP: (Busy=0,Free=1000,Min=1,Max=1000), WORKER: (Busy=7,Free=32760,Min=1,Max=32767) Not all tasks completed cleanly, IOCP: (Busy=0,Free=1000,Min=1,Max=1000), WORKER: (Busy=7,Free=32760,Min=1,Max=32767) connection-string-removed:6380 failed to nominate (WaitingForActivation) No masters detected connection-string-removed:6380: Standalone v2.0.0, master; keep-alive: 00:01:00; int: Connecting; sub: Connecting; not in use: DidNotRespond connection-string-removed:6380: int ops=0, qu=2, qs=0, qc=0, wr=0, async=3, socks=2; sub ops=0, qu=0, qs=0, qc=0, wr=0, socks=2 Circular op-count snapshot; int: 0 (0.00 ops/s; spans 10s); sub: 0 (0.00 ops/s; spans 10s) Sync timeouts: 0; fire and forget: 0; last heartbeat: -1s ago resetting failing connections to retry... retrying; attempts left: 1... 1 unique nodes specified Requesting tie-break from connection-string-removed:6380 > __Booksleeve_TieBreak... Allowing endpoints 00:00:05 to respond... Awaiting task completion, IOCP: (Busy=0,Free=1000,Min=1,Max=1000), WORKER: (Busy=8,Free=32759,Min=1,Max=32767) EndConnect: connection-string-removed:6380 (socket shutdown) Connect complete: connection-string-removed:6380 All tasks completed cleanly, IOCP: (Busy=0,Free=1000,Min=1,Max=1000), WORKER: (Busy=11,Free=32756,Min=1,Max=32767) connection-string-removed:6380 faulted: SocketFailure on PING Awaiting task completion, IOCP: (Busy=0,Free=1000,Min=1,Max=1000), WORKER: (Busy=11,Free=32756,Min=1,Max=32767) Not all tasks completed cleanly, IOCP: (Busy=0,Free=1000,Min=1,Max=1000), WORKER: (Busy=7,Free=32760,Min=1,Max=32767) connection-string-removed:6380 failed to nominate (WaitingForActivation) No masters detected connection-string-removed:6380: Standalone v2.0.0, master; keep-alive: 00:01:00; int: Connecting; sub: Connecting; not in use: DidNotRespond connection-string-removed:6380: int ops=0, qu=2, qs=0, qc=0, wr=0, async=7, socks=3; sub ops=0, qu=0, qs=0, qc=0, wr=0, socks=3 Circular op-count snapshot; int: 0 (0.00 ops/s; spans 10s); sub: 0 (0.00 ops/s; spans 10s) Sync timeouts: 0; fire and forget: 0; last heartbeat: -1s ago

Can anyone decipher this?

like image 753
RPM1984 Avatar asked Nov 01 '22 01:11

RPM1984


1 Answers

I'm with the Azure Web Apps team - it looks like your VNET got into a particularly strange state, and was interrupting network connectivity for your app. I have fixed this behavior.

We are incredibly sorry for the inconvenience...

like image 173
Aleks B Avatar answered Nov 13 '22 02:11

Aleks B