Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Turning off ServiceFabric clusters overnight

We are working on an application that processes excel files and spits off output. Availability is not a big requirement.

Can we turn the VM sets off during night and turn them on again in the morning? Will this kind of setup work with service fabric? If so, is there a way to schedule it?

like image 320
Avinash Gadiraju Avatar asked Sep 22 '16 13:09

Avinash Gadiraju


People also ask

How do I cancel my fabric service?

At the moment you cannot stop a 'service' in Service Fabric. You can only remove it. But you can start/stop, enable/disable nodes within the cluster. There is a enhancement request in Azure feedback forums to have the service start/stop feature.

What is the minimum supported size for a Service Fabric cluster running production workloads?

The minimum supported size for a Service Fabric cluster running production workloads is five nodes.

How do I remove a local service fabric cluster?

Sign in to Azure and select the subscription ID with which you want to remove the cluster. You can find your subscription ID by logging in to the Azure portal. Delete the resource group and all the cluster resources using the Remove-AzResourceGroup cmdlet or az group delete command.

What is quorum in Service Fabric?

In a stateful service, incoming data is replicated between replicas (the primary and any active secondaries). If a majority of the replicas receive the data, data is considered quorum committed.


3 Answers

Thank you all for replying. I've got a chance to talk to a Microsoft Azure rep and documented the conversation in here for community sake.

Response for initial question

A Service Fabric cluster must maintain a minimum number of Primary node types in order for the system services to maintain a quorum and ensure health of the cluster. You can see more about the reliability level and instance count at https://azure.microsoft.com/en-gb/documentation/articles/service-fabric-cluster-capacity/. As such, stopping all of the VMs will cause the Service Fabric cluster to go into quorum loss. Frequently it is possible to bring the nodes back up and Service Fabric will automatically recover from this quorum loss, however this is not guaranteed and the cluster may never be able to recover.

However, if you do not need to save state in your cluster then it may be easier to just delete and recreate the entire cluster (the entire Azure resource group) every day. Creating a new cluster from scratch by deploying a new resource group generally takes less than a half hour, and this can be automated by using Powershell to deploy an ARM template. https://azure.microsoft.com/en-us/documentation/articles/service-fabric-cluster-creation-via-arm/ shows how to setup the ARM template and deploy using Powershell. You can additionally use a fixed domain name or static IP address so that clients don’t have to be reconfigured to connect to the cluster. If you have need to maintain other resources such as the storage account then you could also configure the ARM template to only delete the VM Scale Set and the SF Cluster resource while keeping the network, load balancer, storage accounts, etc.

Q)Is there a better way to stop/start the VMs rather than directly from the scale set?

If you want to stop the VMs in order to save cost, then starting/stopping the VMs directly from the scale set is the only option.

Q) Can we do a primary set with cheapest VMs we can find and add a secondary set with powerful VMs that we can turn on and off?

Yes, it is definitely possible to create two node types – a Primary that is small/cheap, and a ‘Worker’ that is a larger size – and set placement constraints on your application to only deploy to those larger size VMs. However, if your Service Fabric service is storing state then you will still run into a similar problem that once you lose quorum (below 3 replicas/nodes) of your worker VM then there is no guarantee that your SF service itself will come back with all of the state maintained. In this case your cluster itself would still be fine since the Primary nodes are running, but your service’s state may be in an unknown replication state.

I think you have a few options:

  1. Instead of storing state within Service Fabric’s reliable collections, instead store your state externally into something like Azure Storage or SQL Azure. You can optionally use something like Redis cache or Service Fabric’s reliable collections in order to maintain a faster read-cache, just make sure all writes are persisted to an external store. This way you can freely delete and recreate your cluster at any time you want.
  2. Use the Service Fabric backup/restore in order to maintain your state, and delete the entire resource group or cluster overnight and then recreate it and restore state in the morning. The backup/restore duration will depend entirely on how much data you are storing and where you export the backup.
  3. Utilize something such as Azure Batch. Service Fabric is not really designed to be a temporary high capacity compute platform that can be started and stopped regularly, so if this is your goal you may want to look at an HPC platform such as Azure Batch which offers native capabilities to quickly burst up compute capacity.
like image 154
Avinash Gadiraju Avatar answered Sep 18 '22 23:09

Avinash Gadiraju


No. You would have to delete the cluster and recreate the cluster and deploy the application in the morning.

like image 38
Todd Abel Avatar answered Sep 18 '22 23:09

Todd Abel


Turning off the cluster is, as Todd said, not an option. However you can scale down the number of VM's in the cluster.

During the day you would run the number of VM's required. At night you can scale down to the minimum of 5. Check this page on how to scale VM sets: https://azure.microsoft.com/en-us/documentation/articles/service-fabric-cluster-scale-up-down/

like image 37
Martin Avatar answered Sep 19 '22 23:09

Martin