When I log in to Service Fabric Explorer and try to disable a node for an OS upgrade I am presented with two options: <ul> <li>Deactivate (Pause)</li> <li>Deactivate (Restart)</li> </ul> Can anyone tell me the difference?

Service Fabric has APIs that let you manage nodes (in C# these are DeactivateNodeAsync and ActivateNodeAsync, in PS they're Enable/Disable-ServiceFabricNode). First of all, most of these are holdovers from when people managed their own clusters, and should be less commonly used in the Azure Hosted Service Fabric Cluster environment compared to when you run your own clusters. Either way when deactivating a node there are several different options, which we call Intents. You can think of these as representing increasingly severe operations on the nodes, which you'd use under different situations, and you use them to communicate to Service Fabric what is being done to the node. The four different options are: <ol> <li> Pause - effectively "pauses" the node: Services on it will continue to run, but no services should move in or out of the node unless they fail on their own, or unless moving a service to the node is necessary to prevent outage or inconsistency.</li> <li> Restart - this will move all of the in-memory stateful and stateless services off the node, and then shut down (close) any persistent services (if it is safe to do so, if not we'll build spares).</li> <li> RemoveData - this will close down all of the services on the node, again building spares first if it is necessary for safety. The user is responsible for ensuring that if the node does come back, it comes back empty.</li> <li> RemoveNode - this will close down all of the services on the node, again building spares first if necessary for safety. In this case though you're specifically telling SF that this node isn't coming back. SF performs an additional check to make sure that the node which is being removed isn't a SeedNode (one of the nodes currently responsible for maintaining the underlying cluster). Other than that, this is the same as RemoveData.</li> </ol> Now let's talk about when you'd use each. Pause is most common if you want to debug a given service, process, machine etc, and would like it to not be changed (to the degree possible) while you are looking at it. It would be a little awkward if you went to go diagnose some behavior of a service only to determine that we had just moved it on you. Restart (which is the most common of these we see used) is used when for some reason you want to move all the workloads off the node. For example Service Fabric uses this itself when upgrading the Service Fabric bits on the node - first we deactivate the node with intent restart, and then we wait for that to complete (so we know your services are not running) before we shut down and upgrade our own code on that node. RemoveData is where you know the node is being deprovisioned and will not be coming back (say that the hard drives are going to be swapped out, or the hardware being completely removed), or you know that if the node is coming back it's specifically going to be empty (say you're reimaging the machine). The difference between Restart and RemoveData is that for restart, we know the node is coming back, so we keep the knowledge of the replicas on that node. For persistent replicas this means that we don't have to build the replicas again immediately. But for RemoveData we know that the replicas are not coming back, and so need to build any spares immediately before confirming that the node is safe to restart. RemoveNode builds on top of RemoveData, and is an additional indicator that you have no specific plans to bring this node back. Since it's important to keep the SeedNodes up, SF will fail the call if the node to be removed is currently a Seed. If you really want to remove that specific node, you can reconfigure the cluster to use a different node as a seed. An example of when you'd want to use RemoveData vs. RemoveNode is that if you're scaling down a cluster, you'd be explicitly calling RemoveNode, since you intent for the nodes not to come back and want to make sure you're taking the right ones away so the underlying cluster doesn't collapse. Once the operation (whatever it is) is done and you want to re-enable the node, the corresponding call is Activate/Enable. Restarting a node doesn't cause it to become automatically re-enabled. So if you are done with the software patch (or whatever caused you to use intent Restart, for example), and you want services to be placed on the node again, you would call Enable/Activate with the appropriate node Name. As an example of the deactivate/disable call, check out the PS API documentation here

Service Fabric Deactivate (pause) vs Deactivate (restart)?

1 Answers

Service Fabric has APIs that let you manage nodes (in C# these are DeactivateNodeAsync and ActivateNodeAsync, in PS they're Enable/Disable-ServiceFabricNode). First of all, most of these are holdovers from when people managed their own clusters, and should be less commonly used in the Azure Hosted Service Fabric Cluster environment compared to when you run your own clusters. Either way when deactivating a node there are several different options, which we call Intents.

You can think of these as representing increasingly severe operations on the nodes, which you'd use under different situations, and you use them to communicate to Service Fabric what is being done to the node.

The four different options are:

Pause - effectively "pauses" the node: Services on it will continue to run, but no services should move in or out of the node unless they fail on their own, or unless moving a service to the node is necessary to prevent outage or inconsistency.
Restart - this will move all of the in-memory stateful and stateless services off the node, and then shut down (close) any persistent services (if it is safe to do so, if not we'll build spares).
RemoveData - this will close down all of the services on the node, again building spares first if it is necessary for safety. The user is responsible for ensuring that if the node does come back, it comes back empty.
RemoveNode - this will close down all of the services on the node, again building spares first if necessary for safety. In this case though you're specifically telling SF that this node isn't coming back. SF performs an additional check to make sure that the node which is being removed isn't a SeedNode (one of the nodes currently responsible for maintaining the underlying cluster). Other than that, this is the same as RemoveData.

Now let's talk about when you'd use each. Pause is most common if you want to debug a given service, process, machine etc, and would like it to not be changed (to the degree possible) while you are looking at it. It would be a little awkward if you went to go diagnose some behavior of a service only to determine that we had just moved it on you. Restart (which is the most common of these we see used) is used when for some reason you want to move all the workloads off the node. For example Service Fabric uses this itself when upgrading the Service Fabric bits on the node - first we deactivate the node with intent restart, and then we wait for that to complete (so we know your services are not running) before we shut down and upgrade our own code on that node. RemoveData is where you know the node is being deprovisioned and will not be coming back (say that the hard drives are going to be swapped out, or the hardware being completely removed), or you know that if the node is coming back it's specifically going to be empty (say you're reimaging the machine). The difference between Restart and RemoveData is that for restart, we know the node is coming back, so we keep the knowledge of the replicas on that node. For persistent replicas this means that we don't have to build the replicas again immediately. But for RemoveData we know that the replicas are not coming back, and so need to build any spares immediately before confirming that the node is safe to restart. RemoveNode builds on top of RemoveData, and is an additional indicator that you have no specific plans to bring this node back. Since it's important to keep the SeedNodes up, SF will fail the call if the node to be removed is currently a Seed. If you really want to remove that specific node, you can reconfigure the cluster to use a different node as a seed. An example of when you'd want to use RemoveData vs. RemoveNode is that if you're scaling down a cluster, you'd be explicitly calling RemoveNode, since you intent for the nodes not to come back and want to make sure you're taking the right ones away so the underlying cluster doesn't collapse.

Once the operation (whatever it is) is done and you want to re-enable the node, the corresponding call is Activate/Enable. Restarting a node doesn't cause it to become automatically re-enabled. So if you are done with the software patch (or whatever caused you to use intent Restart, for example), and you want services to be placed on the node again, you would call Enable/Activate with the appropriate node Name.

As an example of the deactivate/disable call, check out the PS API documentation here

129

answered Oct 05 '22 00:10

masnider

Related questions
                            
                                Run triggered Azure WebJob from Code
                            
                                Multi threading in Node.js?
                            
                                404 handling in Azure website
                            
                                Azure CDN returning 404 while origin url doesn't
                            
                                Where is the list of deployment template schema api versions?
                            
                                Azure Functions call http post inside function
                            
                                Azure Web App does not load .json file
                            
                                ARM Template for to configure App Services with new VNet Integration feature?
                            
                                Deleting an Application's AppRole in Azure Active Directory
                            
                                How to add an appsettings.json file to my Azure Function 3.0 configuration?
                            
                                How to do a 'null' check in 'if' condition action of Azure Logic App
                            
                                Azure CLI Get Current Subscription details
                            
                                Change Azure directory from command line
                            
                                How to add Azure AD Authentication to Existing ASP.NET MVC Application?
                            
                                Azure Website - Web.config transform fails "No element in the source document matches /configuration/system.identityModel/identityConfiguration"
                            
                                How to check/view data in SQL DB in Microsoft Azure
                            
                                Upload Picture to Windows Azure Web Site
                            
                                MVC Bundling - Failed to load resource
                            
                                Can I loop over properties in ARM templates?
                            
                                "Fluent methods may not be invoked on a Query created via CloudTable.CreateQuery<T>()" exception

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Service Fabric Deactivate (pause) vs Deactivate (restart)?

Tags:

azure

azure-service-fabric

Wojtek Turowicz

People also ask

1 Answers

masnider

Recent Activity

Donate For Us