I have been inspired by the video "Scaling the Real-time Web with ASP.NET SignalR" at the 56 min an 11 sec section.
Imagine a web based chat client using SignalR to communicate with the server. When the client connects, its endpoint information is stored in an Azure table.
A chat client can send a message to another chat client via SignalR which looks up the end point of the destination client of interest (maybe on a different instance), then using Web API sends the message to the other instance to the client via SignalR.
To demonstrate I have uploaded a sample application to github.
This all works when there is a single Azure instance. However if there are MULTIPLE azure instances the very final call to the SignalR from the server to the client silently fails. Its like the dynamic code just doesn't exist or its coming off a 'bad' thread or the message has been somehow sent to the wrong instance or I have just made a dopey mistake.
Any ideas would be greatly appreciated.
The web page is set up with this
<input type="radio" name='ClientId' value='A' style='width:30px'/>Chat client A</br>
<input type="radio" name='ClientId' value='B' style='width:30px'/>Chat client B</br>
<input type='button' id='register' value='Register' />
<input type='text' id='txtMessage' size='50' /><input type='button' id='send' value='Send' />
<div id='history'>
</div>
and the JS is
<script type="text/javascript">
$(function () {
// Declare a proxy to reference the hub.
var chat = $.connection.chatHub;
chat.client.sendMessageToClient = function (message) {
$('#history').append("<br/>" + message);
};
// Start the connection.
$.connection.hub.start().done(function () {
$('#register').click(function () {
// Call the Send method on the hub.
chat.server.register($('input[name=ClientId]:checked', '#myForm').val());
});
$('#send').click(function () {
// Call the Send method on the hub.
chat.server.sendMessageToServer($('input[name=ClientId]:checked', '#myForm').val(), $('#txtMessage').val());
});
});
});
</script>
The hub is as follows. (I have a little storage class to store the end point information in a Azure table). Notice the static method SendMessageToClient. This is what ultimately fails. It is called from the Web Api class (below)
public class ChatHub : Hub
{
public void Register(string chatClientId)
{
Storage.RegisterChatEndPoint(chatClientId, this.Context.ConnectionId);
}
/// <summary>
/// Receives the message and sends it to the SignalR client.
/// </summary>
/// <param name="message">The message.</param>
/// <param name="connectionId">The connection id.</param>
public static void SendMessageToClient(string message, string connectionId)
{
GlobalHost.ConnectionManager.GetHubContext<ChatHub>().Clients.Client(connectionId).SendMessageToClient(message);
Debug.WriteLine("Sending a message to the client on SignalR connection id: " + connectionId);
Debug.WriteLine("Via the Web Api end point: " + RoleEnvironment.CurrentRoleInstance.InstanceEndpoints["WebApi"].IPEndpoint.ToString());
}
/// <summary>
/// Sends the message to other instance.
/// </summary>
/// <param name="chatClientId">The chat client id.</param>
/// <param name="message">The message.</param>
public void SendMessageToServer(string chatClientId, string message)
{
// Get the chatClientId of the destination.
string otherChatClient = (chatClientId == "A" ? "B" : "A");
// Find out this other chatClientId's end point
ChatClientEntity chatClientEntity = Storage.GetChatClientEndpoint(otherChatClient);
if (chatClientEntity != null)
ChatWebApiController.SendMessage(chatClientEntity.WebRoleEndPoint, chatClientEntity.SignalRConnectionId, message);
}
}
Finally the ChateWebApiController is this
public class ChatWebApiController : ApiController
{
[HttpGet]
public void SendMessage(string message, string connectionId)
{
//return message;
ChatHub.SendMessageToClient(message, connectionId);
}
/// <summary>
/// This calls the method above but on a different instance via Web API
/// </summary>
/// <param name="endPoint">The end point.</param>
/// <param name="connectionId">The connection id.</param>
/// <param name="message">The message.</param>
public static void SendMessage(string endPoint, string connectionId, string message)
{
HttpClient client = new HttpClient();
client.BaseAddress = new Uri("http://" + endPoint);
// Add an Accept header for JSON format.
client.DefaultRequestHeaders.Accept.Add(new MediaTypeWithQualityHeaderValue("application/json"));
string url = "http://" + endPoint + "/api/ChatWebApi/SendMessage/?Message=" + HttpUtility.UrlEncode(message) + "&ConnectionId=" + connectionId;
client.GetAsync(url);
}
}
Firstly, in the absence of any community insight into this problem I have devoted possibly a little too much time to getting to the bottom of this. I expect Microsoft to be releasing some guidance on these matters in the coming months but until then we are largely by ourselves.
The answer to this problem is remarkably complex but it all makes sense when you understand how SignalR is actually working under the hood. Apologies for the long answer but it is necessary in order to give this problem the energy that it deserves.
This solution only applies to multi-instance Azure and SignalR communications. If you are not on Azure (ie Windows Server) then it probably will not apply to you or if you plan to run only one instance of Azure then again this will not apply to you. This is essential viewing http://channel9.msdn.com/Events/Build/2013/3-502 especially from 43min 14 sec to the end.
Here we go…
If you ‘read the side of the box’ you would be lead to believe that SignalR connected to Azure would be using WebSockets. This would make our life simple since the single open socket connection between the client and Azure would inherently be constantly bound to a single Azure instance and all of the communications could flow over that channel.
If you believe this then you would be wrong.
In the current release, SignalR against Azure does not use WebSockets. (This is documented at http://www.asp.net/signalr/overview/getting-started/supported-platforms) IE10 as the client will make use of a “Forever Frame” – a somewhat ill-defined and exotic use of embedded iframes. Reading the excellent ebook found at http://campusmvp.net/signalr-ebook would suggest that it keeps a connection ‘forever’ open to the server. This is not entirely the case. Using Fiddler shows that it opens a HTTP connection every time the client needs to communicate with the server although the initial communications (which result in the OnConnect method being called) are permanently kept open. The URL will be of this format /signalr/connect?transport=foreverFrame&connectionToken= You will see that the icon in Fiddler is a downward pointing green arrow which means ‘downloading’.
We know that Azure makes use of a load balancer. Given that a forever frame will establish a new connection every time it needs to send a message to the server, then how does the load balancer know to always send the message back to the Azure instance that was responsible establishing the server side of the SignalR connection? The answer… it doesn’t; and depending on the application this may or may not be a problem. If the message to Azure simply needs to be recorded or some other action taken then read no further. You do not have a problem. Your server side method will be invoked and you perform the action; simple.
However, if the message needs to be either sent back to the client via SignalR or sent to another client (ie a chat application) then you have a lot more work to do. Which one of the multiple instances can the message be actually sent on? How do I find it? How can you get a message to that other instance?
In order to demonstrate how all of these aspects interact I have written a demo application that can be found at https://github.com/daveapsgithub/AzureSignalRInteration The application has lots of details on its web page, but in short if you run it you will readily see that the only instance that will successfully send a message back to the client is the instance on which the “OnConnect” method is received. Attempting to send a message to a client on any other instance will silently fail.
Also it demonstrates that the load balancer is shunting message to various instances and attempting to reply on any instance that is not the “OnConnected” instance will silently fail. Fortunately, irrespective of the instance that receives the message, the SignalR connection id remains the same for that client. (as you would expect)
With these lessons in mind I revisited my original question and updated the project which can be found at https://github.com/daveapsgithub/AzureSignalRWebApi2 The handling of the Azure Table storage is slightly more complex now. Since the OnConnected method cannot be given any parameters, we are required to store the SignalR connection id and WebApi end point in the Azure table storage initially when OnConnected is called. Subsequently when each client then ‘registers’ itself as either client id ‘A’ or client id ‘B’ this registration call then needs to look up the Azure Table storage for that connection id and set the client id appropriately.
When A sends a message to B, we do not know what instance the message turns up on. But that is now not a problem since we simply look up the end point of ‘B’, do a WebApi call to it and then SignalR can send a message to B.
There are two major pitfalls that you need to be aware of. If you are debugging and have a breakpoint in OnConnected and step through the code, then the client will probably time out and send a subsequent re-connection request (be sure to look at Fiddler). Once you have finished inspecting OnConnected, you will see that it is called again as part of the reconnection request. What could be the problem? The problem is that the reconnection request is on a different HTTP request that had to go through the load balancer. You will now be debugging an entirely different instance with a different WebApi endpoint that is about to be stored in the database. This instance, although it was received via an ‘OnConnected’ message is not THE ‘OnConnected’ instance. The first instance that received the OnConnected message is the only instance that can messages back to the client. So in summary, do not do any time consuming activities in OnConnected (and if you do have to then use some Async pattern to make it run on a separate thread so that OnConnected can return quickly).
Secondly, do not use two instances of IE10 to test SignalR applications that use this architecture. Use IE and another browser. If you open one IE that establishes the SignalR connection and then open another IE, the SignalR connection of the first browser is abandoned and the first IE starts to use the SignalR connection of the second IE. This is actually difficult to believe but refer to the Compute Emulator output windows for verification of this insanity.
Since the first SignalR has abandoned its original connection, its Azure instance will also have been ‘moved’ to another instance, the WebApi end point will not have been updated in the Azure table and any messages that are sent to it will silently fail.
I have updated the source code posted as part of the original question to demonstrate it working. Other than the changes to the Azure table storage class, the code changes were minor. We simply need to add some code to the Onconnected method.
public override System.Threading.Tasks.Task OnConnected()
{
Storage.RegisterChatEndPoint(this.Context.ConnectionId);
staticEndPoint = RoleEnvironment.CurrentRoleInstance.InstanceEndpoints["WebApi"].IPEndpoint.ToString();
staticConnectionId = this.Context.ConnectionId;
return base.OnConnected();
}
public void Register(string chatClientId)
{
Storage.RegisterChatClientId(chatClientId, this.Context.ConnectionId);
}
As commented you definitely want to consider the supported scale out solutions
It would seem, given your use of Azure, that the Azure Service Bus Scaleout would be most relevant.
Could there be a typo in one of those dynamic method calls? In the following method
public static void SendMessageToClient(string message, string connectionId)
{
GlobalHost.ConnectionManager.GetHubContext<ChatHub>().Clients
.Client(connectionId).SendMessageToClient(message);
.....
}
shouldn't the client call be camel cased?
GlobalHost.ConnectionManager.GetHubContext<ChatHub>().Clients
.Client(connectionId).sendMessageToClient(message);
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With