I've recently set up a read replica to take some of the read load off of my Amazon multi-AZ RDS instance. The Amazon documentation clearly states that it is "up to your application to determine how read traffic is distributed across your read replicas".
Has anyone figured out a manageable way to scale read replicas? It doesn't seem like a very extensible solution to have different parts of my application hard-coded to read from specific replicas. Is there a way to set this up that is analogous to putting EC2 instances behind a load balancer?
An AWS engineer provided some insight into the question here.
Here is a snippet of his response:
in general you can load-balance traffic at the following 3 logical places:
- Application layer - create multiple connection pools and send all reads to the read-replicas.
- Web framework/middleware - some web frameworks have in-built support for multiple databases [1].
- External proxy - You can use an external proxy like MySQLproxy [2].
[1] - https://docs.djangoproject.com/en/dev/topics/db/multi-db/
[2] - https://launchpad.net/mysql-proxy
I think HAProxy would be a good option to load balance among multiple read replicas. You can have a config something like this:
listen mysql-cluster 0.0.0.0:3306
mode tcp
balance roundrobin
option mysql-check user root
server db01 x.x.x.x:3306 check
server db02 x.x.x.x:3306 check
server db03 x.x.x.x:3306 check
where x.x.x.x is the replica endpoint.
I've been messing with using Route 53 weighted CNAME to load balance RDS read replicas (and the source). I currently have 3 CNAME record sets for readdb.example.com.
The first points to the source db at db.example.com. This is in case there's a replication error. The application can fallback to the original database for reads. Or if you want, you can have the source carry some proportion of the read load, depending on how you set the weight. The Routing Policy is set to Weighted. I have the weight for the source set to 1, so it takes on a very small burden of the read load. The TTL is set low. I've tried values from 1 to 10. I've left it at 10 for now. You also have to enter a Set ID which is any unique string ("Source Database").
The second record set points to one of the read replicas (readdb1.blahblah.rds.amazonaws.com). Routing Policy is weighted, and TTL is 10 like before. It also needs a unique Set ID. I set the weight for this one between 5-50, depending. This one, I do associate with a health check, which you have to create ahead of time. You can probably use a simple healthcheck that points to the replica, but I did something a little different.
I put a file like this on each of my application servers (I'm using PHP Elastic Beanstalk, but you could do something similar in other setups/languages I assume):
<?php if($instanceid = $_GET["id"]): ?>
<?php
exec("aws rds describe-db-instances --db-instance-identifier " . escapeshellarg($instanceid), $rdsinfo);
$rdsinfo = implode(' ',$rdsinfo);
$rdsinfo = json_decode($rdsinfo, true);
if($rdsinfo["DBInstances"][0]["StatusInfos"][0]["Normal"] && $rdsinfo["DBInstances"][0]["DBInstanceStatus"] === "available"){
echo "GOOD!";
}
else {
echo "BAD!";
};
/* Then there's some other stuff in here that is a little unrelated to the question */
?>
<?php endif ?>
This file uses the AWS command line interface which is installed on Elastic Beanstalk applications and only requires that the environmental variables for AWS_ACCESS_KEY_ID, AWS_DEFAULT_REGION, and AWS_SECRET_KEY be specified ahead of time. So then you make a Route 53 health check that points to http://www.example.com/rdshealthcheck/rdsshealthcheck.php?id=readdb1 . You set the search string to "GOOD!" I think a search string costs $1/month/health check, which seems reasonable.
If you have a second read replica, you can create another healthcheck that points to http://www.example.com/rdshealthcheck/rdsshealthcheck.php?id=readdb2 or whatever it's called.
I actually only use one read replica at this time, but it is significantly larger than my source db. It was more economical for me, because my source DB is multi-az. I keep the third record set and second health check around in case the first replica is giving me problems. That way, I don't have to wait for the first one to delete before relaunching it. Instead, I immediately delete the first one and launch the second one using the name specified in the third recordset (and second health check).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With