I have a service fabric cluster that seems to be stuck in the roll back phase of an automatic upgrade for over seven days.
This is the output from Get-ServiceFabricClusterUpgrade
:
TargetCodeVersion : 5.5.216.0
TargetConfigVersion : 2
StartTimestampUtc : 15/06/2017 23:44:40
FailureTimestampUtc : 16/06/2017 01:41:48
FailureReason : HealthCheck
UpgradeState : RollingBackInProgress
UpgradeDuration : 7.14:13:10
CurrentUpgradeDomainDuration : 7.12:16:03
CurrentUpgradeDomainProgress : 0
NodeName : xxxxxxxxxxxxxxxxxxxxx
UpgradePhase : PreUpgradeSafetyCheck
PendingSafetyChecks :
WaitForInbuildReplica - PartitionId: xxxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxx
NextUpgradeDomain : 1
UpgradeDomainsStatus : { "0" = "InProgress";
"1" = "Pending";
"2" = "Pending";
"3" = "Pending";
"4" = "Pending" }
The only other cmdlets under the Service Fabric powershell module that seem related are Start-ServiceFabricClusterUpgrade
, Resume-ServiceFabricClusterUpgrade
and Update-ServiceFabricClusterUpgrade
.
I have tried Start-ServiceFabricClusterUpgrade
with the -Force
switch hoping it would cancel the existing hanging one, and start a new one but unfortunately not. I have also restarted the node that is in progress but that has made no difference either.
In the absence of a Stop-ServiceFabricClusterUpgrade
, is there anything else I can do to stop this process?
Troubleshoot application upgrades says that -
"An UpgradePhase of PreUpgradeSafetyCheck means there were issues preparing the upgrade domain before it was performed.The most common issues in this case are service errors in the close or demotion from primary code paths."
So probably SF was not able to shut down service executable. The easiest way might be to Deactivate(restart) the node mentioned in the output from the SF Explorer.
What I did in the end was log onto the nodes in the cluster one by one and restart them, waiting for the previous one to come back up before restarting the next one.
This fixed it and the upgrade process eventually finished. The restart on the VMSS would probably have achieved the same thing, but I'm not sure whether there would have been a service outage during the restart. It certainly would have been less time consuming.
Two ways that I can see you accomplishing this:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With