I'm experimenting with a setup that is very much like the one detailed in the image here: https://raw.githubusercontent.com/Oreste-Luci/netflix-oss-example/master/netflix-oss-example.png
In my setup, I'm using a client application (https://www.joedog.org/siege-home/), a proxy (Zuul), a discovery service (Eureka) and a simple microservice. Everything is deployed on PWS.
I want to migrate from one version of my simple microservice to the next without any downtime. Initially I started out with the technique described here: https://docs.cloudfoundry.org/devguide/deploy-apps/blue-green.html
In my opinion, this approach is not "compatible" with a discovery service such as Eureka. In fact, the new version of my service is registered in Eureka and receives traffic even before I can remap all the routes (CF Router).
This lead me to another approach, in which I rely on the failover mechanisms in Spring Cloud/Netflix:
As I understand, Zuul uses Ribbon (load-balancing) under the hood so in that split second where the old instance is still in Eureka but actually shutting down, I expect a retry on the new instance without any impact on the client.
However, my assumption is wrong. I get a few 502 errors in my client:
Lifting the server siege... done.
Transactions: 5305 hits
Availability: 99.96 %
Elapsed time: 59.61 secs
Data transferred: 26.06 MB
Response time: 0.17 secs
Transaction rate: 89.00 trans/sec
Throughput: 0.44 MB/sec
Concurrency: 14.96
Successful transactions: 5305
Failed transactions: 2
Longest transaction: 3.17
Shortest transaction: 0.14
Part of my application.yml
server:
port: ${PORT:8765}
info:
component: proxy
ribbon:
MaxAutoRetries: 2 # Max number of retries on the same server (excluding the first try)
MaxAutoRetriesNextServer: 2 # Max number of next servers to retry (excluding the first server)
OkToRetryOnAllOperations: true # Whether all operations can be retried for this client
ServerListRefreshInterval: 2000 # Interval to refresh the server list from the source
ConnectTimeout: 3000 # Connect timeout used by Apache HttpClient
ReadTimeout: 3000 # Read timeout used by Apache HttpClient
hystrix:
threadpool:
default:
coreSize: 50
maxQueueSize: 100
queueSizeRejectionThreshold: 50
command:
default:
execution:
isolation:
thread:
timeoutInMilliseconds: 10000
I'm not sure what goes wrong.
Is this a technical issue?
Or am I making the wrong assumptions (I did read somewhere that POSTs are not retried anyway, which I don't really understand)?
I'd love to hear how you do it.
Thanks, Andy
I've wondered about this also. I won't claim to have used Spring Cloud "In Anger". I've just been experimenting with it for a while.
Assumption: we assume that the source of truth for all instance state is stored in Eureka, then Eureka should be our mechanism of operational control. We can use Eureka to take an instance out of service by setting the instance state to OUT_OF_SERVICE
. When Ribbon refreshes its server list it will not use these out of service instances. Eureka provides a REST API for querying instances and setting instance state. Great.
The problem is: How do I identify which instances are in the Blue group and which instances are in the Green group?
I was thinking... Eureka provides a metadata map for each instance. Say in our build / bake step we set a version id in the metadata map? We could use a Git commit Id or some semantic versioning scheme or whatever. Ok, now I can look at the Eureka metadata and identify Blue versus Green instances given that version value. We can set the metadata values in each service using properties.
e.g. eureka.instance.metadataMap.version=8675309
Now what would be nice is if we could just tell Eureka. "Take all the instances for the FUBAR service and version 8675309 out of service." Well, I don't think that provided out of the box. The cool thing about Spring Cloud is that all these services, including Eureka Server, are just Spring apps that we can hack for our own needs. The code below exposes an end point that sets instances to "out of service" given an App Name and a Version. Just add this controller to your Eureka Server. It's not production ready, just an idea really.
Now once Eureka takes these instances out of service and Ribbon refreshes its server list it is safe to kill or route away from these instances.
POST to:
http://[eurekahost:port]/takeInstancesOutOfService?applicationName=FOOBAR&version=8675309
Hope that helps?
import java.util.Collection;
import java.util.function.Predicate;
import java.util.stream.Collectors;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RequestMethod;
import org.springframework.web.bind.annotation.RequestParam;
import org.springframework.web.bind.annotation.RestController;
import com.netflix.appinfo.InstanceInfo;
import com.netflix.appinfo.InstanceInfo.InstanceStatus;
import com.netflix.discovery.shared.Application;
import com.netflix.eureka.EurekaServerContextHolder;
import com.netflix.eureka.registry.PeerAwareInstanceRegistry;
@RestController
public class EurekaInstanceStateController {
@RequestMapping(value="/instancesQuery", method=RequestMethod.POST)
public Collection<String> queryInstancesByMetaData(
@RequestParam("applicationName") String applicationNameCriteria,
@RequestParam("version") String versionCriteria)
{
return getRegistry().getSortedApplications()
.stream()
.filter(hasApplication(applicationNameCriteria))
.flatMap(app -> app.getInstances().stream())
.filter(hasVersion(versionCriteria))
.map(info -> info.getAppName() + " - " + info.getId() + " - " + info.getStatus() + " - " + info.getMetadata().get("version"))
.collect(Collectors.toList());
}
@RequestMapping(value="/takeInstancesOutOfService", method=RequestMethod.POST)
public Collection<String> takeInstancesOutOfService(
@RequestParam("applicationName") String applicationNameCriteria,
@RequestParam("version") String versionCriteria)
{
return getRegistry().getSortedApplications()
.stream()
.filter(hasApplication(applicationNameCriteria))
.flatMap(app -> app.getInstances().stream())
.filter(hasVersion(versionCriteria))
.map(instance -> updateInstanceStatus(instance, InstanceStatus.OUT_OF_SERVICE) )
.collect(Collectors.toList());
}
/**
* @param instance
* @return
*/
private String updateInstanceStatus(InstanceInfo instance, InstanceStatus status)
{
boolean isSuccess = getRegistry().statusUpdate(instance.getAppName(), instance.getId(),
status, String.valueOf(System.currentTimeMillis()),
true);
return (instance.getAppName() + " - " + instance.getId() + " result: " + isSuccess);
}
/**
* Application Name Predicate
* @param applicationNameCriteria
* @return
*/
private Predicate<Application> hasApplication(final String applicationNameCriteria)
{
return application -> applicationNameCriteria.toUpperCase().equals(application.getName());
}
/**
* Instance Version Predicate. Uses Eureka Instance Metadata value name "version".</br>
*
* Set / Bake the instance metadata map to contain a version value.</br>
* e.g. eureka.instance.metadataMap.version=85839c2
*
* @param versionCriteria
* @return
*/
private Predicate<InstanceInfo> hasVersion(final String versionCriteria)
{
return info -> versionCriteria.equals(info.getMetadata().get("version"));
}
private PeerAwareInstanceRegistry getRegistry() {
return EurekaServerContextHolder.getInstance().getServerContext().getRegistry();
}
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With