I have an amazon rds instance database, and I'm using a read replica for analytics. However, every week or so the read replica crashes with a replication error.
I've tried looking at the slave status and skipping replication errors as per this help article however, I've only been able to restore it by fiddling around and creating a new read replica.
This becomes problematic for several reasons because external services depend on the initial read replica.
The main database is fine, but it seems some integrity errors cause the read replica to crash and not recover.
Currently my read replica has these parameters:
Replication State: Error
Replication Error: Error 'Cannot add or update a child row: a foreign key constraint fails.....
Is there a way I can configure this read replica to skip all errors? I just am trying to figure out how to make it more stable. Thanks!
You should never, ever skip a replication error until you have understood what is causing the error, repaired the underlying problem, and corrected any data inconsistencies between the master and replica.
With each error you skip, the divergence increases between your master and replica's data sets, and any divergence is unacceptable.
You have no real alternative but to create a new replica instance and discard the old one. If a replication error occurs on the new one, stop, figure out why, and fix whatever is being done incorrectly in your configuration or your application to cause the errors.
Skipping replication errors on RDS for MySQL should be considered an emergency stop-gap measure, only, unless you have a thorough understanding of the internals of MySQL replication... because in a correct setup, they are rare.
It turns out the source of the problem is around storage engines. From the amazon faq here: http://aws.amazon.com/rds/faqs/#130
Amazon RDS for MySQL Read Replicas require a transactional storage engine and are only supported for the InnoDB storage engine. Non-transactional MySQL storage engines such as MyISAM might prevent Read Replicas from working as intended. However, if you still choose to use MyISAM with Read Replicas, we advise you to watch the Amazon CloudWatch “Replica Lag” metric (available via the AWS Management Console or Amazon CloudWatch APIs) carefully and recreate the Read Replica should it fall behind due to replication errors. The same considerations apply to the use of temporary tables and any other non-transactional engines.
We were using MyISAM. However, we had switched from InnoDB tables for other reasons. So we don't quite have an answer here because it seems InnoDB storage engines give us one problem and MyISAM enginges give us another. We'll have to dive deeper to figure this out, but it seems we need a transactional storage engine to make read replicas work consistently and properly.
I solved it by creating a mysql event scheduler like this :
CREATE EVENT repl_error_skipper
ON SCHEDULE
EVERY 15 MINUTE
COMMENT 'Calling rds_skip_repl_error to skip replication error'
Do
CALL mysql.rds_skip_repl_error;
/*also you can add other logic */
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With