Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

using read replication in mysql

I have a mysql db which has around 150 millions inserts per day and retention period is around 60 days.

  1. Each record is indexed on id.
  2. Everytime a update happens as follows:
    1. Look if record is present. If it is , update the same with new data.
    2. Or else create the data.
  3. Delete records which are created more then 60 days before.

My main use case is follows:

Run some bulk queries. eg.:

Select (*) from table where prop=val1 and prop2=val2 etc

Will returns large no of records eg. 1M

Is following approach good:

  1. Have a master DB with index on id only. Have a retention of 60 days.
  2. Have Read Replica DB. This DB will be indexed on many columns
  3. All bulk queries will be run against read replica DB.

Is this a good solution?

EDIT : I plan to use Amazon RDS DB and found this in their documentation:

 Q: Can my Read Replicas only accept database read operations?

Read Replicas are designed to serve read traffic. However, there may be use cases where advanced users wish to complete Data Definition Language (DDL) SQL statements against a Read Replica. Examples might include adding a database index to a Read Replica that is used for business reporting, without adding the same index to the corresponding source DB Instance. If you wish to enable operations other than reads for a given Read Replica, you will need to modify the active DB Parameter Group for the Read Replica, setting the “read_only” parameter to “0.”

like image 672
user93796 Avatar asked Sep 03 '13 11:09

user93796


People also ask

How do you use read replica?

Sign in to the AWS Management Console and open the Amazon RDS console at https://console.aws.amazon.com/rds/ . In the navigation pane, choose Databases. Choose the MariaDB, MySQL, Oracle, or PostgreSQL DB instance that you want to use as the source for a read replica. For Actions, choose Create read replica.

What are two benefits of using read replicas?

Benefits of Read Replica Read Replica helps in decreasing load on the primary DB by serving read-only traffic. A Read Replica can be manually promoted as a standalone database instance. You can create Read Replicas within AZ, Cross-AZ or Cross-Region.

Can I create a read replica of another read replica?

You can now create a second-tier Read Replica from an existing first-tier Read Replica. By creating a second-tier Read Replica, you may be able to move some of the replication load from the master database instance to a first-tier Read Replica.

How is replication done in MySQL?

Replication enables data from one MySQL database server (known as a source) to be copied to one or more MySQL database servers (known as replicas). Replication is asynchronous by default; replicas do not need to be connected permanently to receive updates from a source.


2 Answers

To answer your question:

Is following approach good:

  1. Have a master DB with index on id only. Have a retention of 60 days.
  2. Have Read Replica DB. This DB will be indexed on many columns
  3. All bulk queries will be run against read replica DB.

Is this a good solution?

Updated

In my opinion and experience, No.

Technically, this solution may work, but practically not suitable for production use. The built in master-slave replication of mysql, works only if the table in the slave database has the same layout as the table in the master database.

You will have approximately 9 billion records (150 x 60). My estimate is on disk this could take up to 1TB (each record the size of a tweet). 150 million inserts and 150 million deletes (of expired records) will surely make indexes fragmented and inserts slower, requiring re build frequently.

Things will get incrementally more complicated when you need more than one read replica, a natural evolution of the ecosystem.

If you have 150 million inserts a day, you should consider a NOSQL database. Mongodb used to support Innodb as well, not sure if it still does.

If you wish to stick to an RDBMS like MySQL, you should use strategy such as Database Sharding. In this strategy, you segment your data in such a way that the load gets distributed across a cluster of MySQL instances.

A slightly less scalable than Sharding is to use a storage engine such as MyISAM. MyISAM is not fully ACID compliant but offers great performance. It supports concurrent inserts.

like image 99
Litmus Avatar answered Oct 25 '22 14:10

Litmus


Consider using Fastbit if your primary use is SELECT * with no joins and multiple filters on different columns. Fastbit implements WAH compressed bitmaps that can be evaluated very efficiently and stores data as a column store.

https://sdm.lbl.gov/fastbit/

For MySQL, perhaps consider TokuDB which has 'clustered' index support, or creating covering indexes in InnoDB. This is really only effective if you have a small combination of attributes to filter on. If not, consider fastbit.

If you always filter on the same attributes, then you can consider using Flexviews: http://flexvie.ws

You could create a view for select * from table where val1=X and val2=Y

or just roll your own version. after loading data do: replace into summary_table_v2v2 select * from table where val1=X and val2=Y and table.last_update > NOW()-INTERVAL 1 DAY;

That will "refresh" the table with any changes made in the last day, assuming last_update is a timestamp column.

like image 27
Justin Swanhart Avatar answered Oct 25 '22 14:10

Justin Swanhart