We have experienced a problem with AWS Aurora failover and looking for pointers as to how to resolve.
Scenario
AWS Aurora set up with two end points:
So therefore our PDO MySQL Connection string is stackName-dbcluster-ID.cluster-ID.us-west-2.rds.amazonaws.com
(for writing)
After failover
On failover, the DNS entries are flipped to point as follows:
Critically, the PDO Connection string (for writing) remains the same "stackName-dbcluster-ID.cluster-ID.us-west-2.rds.amazonaws.com" b ut points to a different IP address.
What Happened
We had error 1290 "SQLSTATE[HY000]: General error: 1290 The MySQL server is running with the --read-only option so it cannot execute this statement".
As the DB engines are stopped started, our initial persistent connections will have "gone away" and been invalidated (something we immediately handle in a reconnect/retry code).
However the error above means new connections will have been made to the old node, but then not further invalidated with propagation of the DNS change. They lasted 10/15 minutes (well beyond TTL of the DNS).
My Questions
Side notes
We already have code in place to handle the retry, and they retry is told to get a new non-persistent connection (which works). It's at this point we could "invalidate" the PDO connection so the next run of a script does not repeat this cycle over and over.
The failover can happen at any time, so we're not in a position to do manual actions such as restart php (as we had to do this time).
Without persistent connections, performance is notably slower.
FastCGI, Centos 16, PHP 7.2, MySQLD 5.0.12-dev (which is normal on Centos - see https://superuser.com/questions/1433346/php-shows-outdated-mysqlnd-version)
The biggest drawback to persistent connections is that it limits the number of users you can have browsing your site: if MySQL is configured to only allow 10 concurrent connections at once then when an 11th person tries to browse your site it won't work for them. PDO does not manage the persistence.
The persistent connection cache allows you to avoid the overhead of establishing a new connection every time a script needs to talk to a database, resulting in a faster web application.
With MySQLi, to close the connection you could do: $this->connection->close(); However with PDO it states you open the connection using: $this->connection = new PDO();
Instead of provisioning and managing database servers, you specify Aurora capacity units (ACUs). Each ACU is a combination of approximately 2 gigabytes (GB) of memory, corresponding CPU, and networking.
Persistent connections must be terminated and restarted.
Reminds me of a 2-minute TTL that took 20 minutes to be recognized. I don't know whether Amazon does a better job, or even if they have any say in DNS.
5.0.12?? That was released in 2005! Maybe a typo. Anyway, I don't think the version matters in this Question.
DNS may not be the optimal way to failover; there are several Proxy servers out there. I would expect them to flip within seconds. However, they need to know who's who rather than depending on DNS.
Can you modify the code to disconnect+reconnect when that error occurs? (It may not help.)
Unfortunately, this error is documented:
https://github.com/jeremydaly/serverless-mysql/issues/7
everything said revolves around migrating to: mysqlnd
driver for mysqlnd_ms
I will continue looking for a more efficient solution.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With