When I log in to Service Fabric Explorer and try to disable a node for an OS upgrade I am presented with two options:
Can anyone tell me the difference?
The Restart-ServiceFabricNode cmdlet restarts a Service Fabric node by restarting the Fabric.exe process that hosts the node. This cmdlet simulates Service Fabric node failures in the cluster, which tests the failover recovery paths of your service.
At the moment you cannot stop a 'service' in Service Fabric. You can only remove it. But you can start/stop, enable/disable nodes within the cluster. There is a enhancement request in Azure feedback forums to have the service start/stop feature.
Answers. Or you can restart the cluster from the SF Cluster manager dashboard. If you redeploy the node it would allocate a new temp drive and remove all the data from the old one. Which would give you the space back.
Use the Get-ServiceFabricNode cmdlet to view the disabling status of the node. Service Fabric ensures that services stay available even if these replicas are closed. The node stays in the disabling state until it is safe to disable it without affecting service availability.
Service Fabric has APIs that let you manage nodes (in C# these are DeactivateNodeAsync and ActivateNodeAsync, in PS they're Enable/Disable-ServiceFabricNode). First of all, most of these are holdovers from when people managed their own clusters, and should be less commonly used in the Azure Hosted Service Fabric Cluster environment compared to when you run your own clusters. Either way when deactivating a node there are several different options, which we call Intents.
You can think of these as representing increasingly severe operations on the nodes, which you'd use under different situations, and you use them to communicate to Service Fabric what is being done to the node.
The four different options are:
Now let's talk about when you'd use each. Pause is most common if you want to debug a given service, process, machine etc, and would like it to not be changed (to the degree possible) while you are looking at it. It would be a little awkward if you went to go diagnose some behavior of a service only to determine that we had just moved it on you. Restart (which is the most common of these we see used) is used when for some reason you want to move all the workloads off the node. For example Service Fabric uses this itself when upgrading the Service Fabric bits on the node - first we deactivate the node with intent restart, and then we wait for that to complete (so we know your services are not running) before we shut down and upgrade our own code on that node. RemoveData is where you know the node is being deprovisioned and will not be coming back (say that the hard drives are going to be swapped out, or the hardware being completely removed), or you know that if the node is coming back it's specifically going to be empty (say you're reimaging the machine). The difference between Restart and RemoveData is that for restart, we know the node is coming back, so we keep the knowledge of the replicas on that node. For persistent replicas this means that we don't have to build the replicas again immediately. But for RemoveData we know that the replicas are not coming back, and so need to build any spares immediately before confirming that the node is safe to restart. RemoveNode builds on top of RemoveData, and is an additional indicator that you have no specific plans to bring this node back. Since it's important to keep the SeedNodes up, SF will fail the call if the node to be removed is currently a Seed. If you really want to remove that specific node, you can reconfigure the cluster to use a different node as a seed. An example of when you'd want to use RemoveData vs. RemoveNode is that if you're scaling down a cluster, you'd be explicitly calling RemoveNode, since you intent for the nodes not to come back and want to make sure you're taking the right ones away so the underlying cluster doesn't collapse.
Once the operation (whatever it is) is done and you want to re-enable the node, the corresponding call is Activate/Enable. Restarting a node doesn't cause it to become automatically re-enabled. So if you are done with the software patch (or whatever caused you to use intent Restart, for example), and you want services to be placed on the node again, you would call Enable/Activate with the appropriate node Name.
As an example of the deactivate/disable call, check out the PS API documentation here
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With