Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Switch a live domain to Google Cloud Run without downtime

I have Google Cloud Run set up on my custom cloud run domain, foo-eu7vrotrfq-uc.a.run.app. I have a domain, foo.com, that is currently serving live traffic. I want to start serving foo.com on Cloud Run without disruption to ~100 concurrent users.

It seems like this is impossible with the current domain mapping feature. Domain mapping requires that DNS is updated in order for the certificate to be issued. According to the documentation, this takes up to 15 minutes (took about 5 minutes in my test). During these 15 minutes, foo.com will not serve correctly.

Here are some ideas:

  • Set up the certificate for cloudrun.foo.com and then CNAME foo.com to cloudrun.foo.com. --> Google returns an error presumably because the hostname is not recognized.
  • If Domain Mapping doesn't check DNS records but just needs to expose the LetsEncrypt challenge, write a server that proxies the challenge to Cloud Run and all other traffic to the current web server. --> This is a lot of work and is dependent on internal implementation details of the Domain Mapping feature. I actually tried this using a Cloudflare Worker but looks like DNS change is required.

Has anyone figured out a workaround for this problem? It seems like there is no way to switch to Cloud Run for existing domains without incurring downtime.

like image 655
ty. Avatar asked Dec 09 '19 19:12

ty.


People also ask

Does cloud run URL change?

Services are the main resources of Cloud Run. Each service has a unique and permanent URL that will not change over time as you deploy new revisions to it.

Will Google shut down Google Cloud?

Google Cloud IoT Core is a “fully managed service that allows you to easily and securely connect, manage, and ingest data from millions of globally dispersed devices” says Google Cloud. Only from August 16, 2023, those millions of devices will need another platform to support them as Google kills off the service.


1 Answers

It's a tough gig to pull off, but I think you have several options. In summary:

  1. Native solution: Register domain, wait until Cloud Run recognizes it, flip DNS as a last step. There will be downtime because Cloud Run needs to get a HTTPS cert from Let's Encrypt.

  2. Cloudflare proxying (with Host header rewrite, which is an Enterprise plan feature), likely no downtime.

What makes this situation really hard is HTTPS. Cloud Run currently does not allow uploading your own TLS certificates, so it can start serving traffic right away (and you can flip to a Cloud Run-managed cert later on).


Option 1

Keep in mind that DNS records, by their nature, will take several hours to propagate across the globe and to residential/edge locations. You need both OLD and NEW endpoints running at all times for maybe 24 hours.

First, make sure you create a Domain Mapping on Cloud Console for your Cloud Run app.

This operation will most likely reveal that you need Domain ownership verification through Google Webmaster tools. This operation alone may take some time. So do it now.

When you are able to create the Domain Mapping, it will give you some DNS records to update your domain with them that looks like following, but do not update your domain's DNS records just yet:

At this point, Google Cloud's load balancing frontends are being configured to route traffic that comes to your foo.com domain name to your app:

curl -vH "Host: foo.com" http://216.239.32.21

< HTTP/1.1 302 Found
< Location: https://example.com/

It seems like Cloud Run now recognizes foo.com exists. Instead of failing with HTTP 404, it is forcing an https:// redirect.

However, Cloud Run cannot yet get a TLS certificate for your domain from Let's Encrypt, because Let's Encrypt cannot visit foo.com to verify the challenge –DNS is still pointing to your old servers.

When you try to query one of these IPs by faking the Host header and using https://, you'll see:

curl -kvH "Host: foo.com" https://216.239.32.21

curl: (35) error:14004410:SSL routines:CONNECT_CR_SRVR_HELLO:sslv3 alert handshake failure

This error means Cloud Run has not yet successfully retrieved a certificate from Let's Encrypt and started to use it.

At this point, you have to point your domain to IP addresses provided by Cloud Run, and there will be some downtime until Cloud Run gets a cert from Let's Encrypt (as it will keep retrying). But this may take some time: 5, 10, 20 mins, hard to guarantee. Remember that DNS records are heavily cached, so this can take even longer.


Option 2

If you use Cloudflare as your load balancer, you can use Page Rules to rewrite Host header. This is available only in their Enterprise plan. With this, any request to foo.com would be rewritten and proxied to your Cloud Run app, like foo-eu7vrotrfq-uc.a.run.app.

This doesn't use Cloud Run "domain mapping" feature, so your Cloud Run setup wouldn't know your foo.com domain at all.

However, if you currently aren't using Cloudflare, follow these guides to avoid downtime, because similar to Cloud Run, Cloudflare needs to provision a certificate for your domain name.

If you're using Cloudflare, this is way more of a smooth transition, and you can quickly revert using Cloudflare Page Rules if something goes bad.

  • How to eliminate (or minimise) downtime when adding your domain to Cloudflare
  • Migrate HTTPS Enabled non-top TLD Domain to Cloudflare without Downtime

I think overall you pose a good question, and thanks for thoroughly explaining it. Your analysis is correct.

Since Cloud Run forces https:// and Let's Encrypt needs to access to your application to approve a TLS certificate for it (and similarly Cloudflare takes time to provision a cert for your domain), this is not easy.

I'm taking this feedback back to the team to discuss, maybe we need a different way to provision TLS certificates for domains to prevent downtimes during migration. I might write a guide on this.

like image 72
ahmet alp balkan Avatar answered Sep 29 '22 06:09

ahmet alp balkan