I have a mysql db which has around 150 millions inserts per day and retention period is around 60 days. <ol> <li>Each record is indexed on id.</li> <li>Everytime a update happens as follows: <ol> <li>Look if record is present. If it is , update the same with new data.</li> <li>Or else create the data.</li> </ol> </li> <li>Delete records which are created more then 60 days before.</li> </ol> My main use case is follows: Run some bulk queries. eg.: <pre class="prettyprint"><code>Select (*) from table where prop=val1 and prop2=val2 etc </code></pre> Will returns large no of records eg. 1M Is following approach good: <ol> <li>Have a master DB with index on id only. Have a retention of 60 days.</li> <li>Have Read Replica DB. This DB will be indexed on many columns</li> <li>All bulk queries will be run against read replica DB.</li> </ol> Is this a good solution? EDIT : I plan to use Amazon RDS DB and found this in their documentation: <pre class="prettyprint"><code> Q: Can my Read Replicas only accept database read operations? </code></pre> <blockquote> Read Replicas are designed to serve read traffic. However, there may be use cases where advanced users wish to complete Data Definition Language (DDL) SQL statements against a Read Replica. Examples might include adding a database index to a Read Replica that is used for business reporting, without adding the same index to the corresponding source DB Instance. If you wish to enable operations other than reads for a given Read Replica, you will need to modify the active DB Parameter Group for the Read Replica, setting the “read_only” parameter to “0.” </blockquote>

Consider using Fastbit if your primary use is SELECT * with no joins and multiple filters on different columns. Fastbit implements WAH compressed bitmaps that can be evaluated very efficiently and stores data as a column store. https://sdm.lbl.gov/fastbit/ For MySQL, perhaps consider TokuDB which has 'clustered' index support, or creating covering indexes in InnoDB. This is really only effective if you have a small combination of attributes to filter on. If not, consider fastbit. If you always filter on the same attributes, then you can consider using Flexviews: http://flexvie.ws You could create a view for select * from table where val1=X and val2=Y or just roll your own version. after loading data do: replace into summary_table_v2v2 select * from table where val1=X and val2=Y and table.last_update > NOW()-INTERVAL 1 DAY; That will "refresh" the table with any changes made in the last day, assuming last_update is a timestamp column.

using read replication in mysql

Tags:

mysql

scaling

replication

database-replication

I have a mysql db which has around 150 millions inserts per day and retention period is around 60 days.

Each record is indexed on id.
Everytime a update happens as follows:
1. Look if record is present. If it is , update the same with new data.
2. Or else create the data.
Delete records which are created more then 60 days before.

My main use case is follows:

Run some bulk queries. eg.:

Select (*) from table where prop=val1 and prop2=val2 etc

Will returns large no of records eg. 1M

Is following approach good:

Have a master DB with index on id only. Have a retention of 60 days.
Have Read Replica DB. This DB will be indexed on many columns
All bulk queries will be run against read replica DB.

Is this a good solution?

EDIT : I plan to use Amazon RDS DB and found this in their documentation:

 Q: Can my Read Replicas only accept database read operations?

Read Replicas are designed to serve read traffic. However, there may be use cases where advanced users wish to complete Data Definition Language (DDL) SQL statements against a Read Replica. Examples might include adding a database index to a Read Replica that is used for business reporting, without adding the same index to the corresponding source DB Instance. If you wish to enable operations other than reads for a given Read Replica, you will need to modify the active DB Parameter Group for the Read Replica, setting the “read_only” parameter to “0.”

672

asked Sep 03 '13 11:09

user93796

2 Answers

To answer your question:

Is following approach good:

Have a master DB with index on id only. Have a retention of 60 days.

Have Read Replica DB. This DB will be indexed on many columns

All bulk queries will be run against read replica DB.

Is this a good solution?

Updated

In my opinion and experience, No.

Technically, this solution may work, but practically not suitable for production use. The built in master-slave replication of mysql, works only if the table in the slave database has the same layout as the table in the master database.

You will have approximately 9 billion records (150 x 60). My estimate is on disk this could take up to 1TB (each record the size of a tweet). 150 million inserts and 150 million deletes (of expired records) will surely make indexes fragmented and inserts slower, requiring re build frequently.

Things will get incrementally more complicated when you need more than one read replica, a natural evolution of the ecosystem.

If you have 150 million inserts a day, you should consider a NOSQL database. Mongodb used to support Innodb as well, not sure if it still does.

If you wish to stick to an RDBMS like MySQL, you should use strategy such as Database Sharding. In this strategy, you segment your data in such a way that the load gets distributed across a cluster of MySQL instances.

A slightly less scalable than Sharding is to use a storage engine such as MyISAM. MyISAM is not fully ACID compliant but offers great performance. It supports concurrent inserts.

answered Oct 25 '22 14:10

Litmus

Consider using Fastbit if your primary use is SELECT * with no joins and multiple filters on different columns. Fastbit implements WAH compressed bitmaps that can be evaluated very efficiently and stores data as a column store.

https://sdm.lbl.gov/fastbit/

For MySQL, perhaps consider TokuDB which has 'clustered' index support, or creating covering indexes in InnoDB. This is really only effective if you have a small combination of attributes to filter on. If not, consider fastbit.

If you always filter on the same attributes, then you can consider using Flexviews: http://flexvie.ws

You could create a view for select * from table where val1=X and val2=Y

or just roll your own version. after loading data do: replace into summary_table_v2v2 select * from table where val1=X and val2=Y and table.last_update > NOW()-INTERVAL 1 DAY;

That will "refresh" the table with any changes made in the last day, assuming last_update is a timestamp column.

answered Oct 25 '22 14:10

Justin Swanhart

Related questions
                            
                                Update if exist, insert if not exist [duplicate]
                            
                                I want to add a facebook functionality of displaying a message when my user loses the internet connection [duplicate]
                            
                                Got error 'invalid repetition count(s)' from regexp
                            
                                how to get the total row count with mysqli
                            
                                Doctrine2 in Symfony2: How can I see which object-call leads into a query?
                            
                                How can you disable result output for the mysql EXECUTE command in workbench
                            
                                MySQL - Search a field ignoring spaces
                            
                                MySQL case sensitivity for primary key
                            
                                PHP frameworks that aren't tightly coupled to the database and work with stored procedures?
                            
                                XAMPP on Google Drive
                            
                                get success/failure response from mysqli_query
                            
                                vachar maximum length for index with InnoDB and UTF-8
                            
                                ALTER TABLE not letting me set NULL or default value?
                            
                                Make rows immutable and allow insert
                            
                                Count SQL Syntax COUNT (value) multiple columns
                            
                                MySQL: how to select everything except first 10 records
                            
                                1432 - Can't create federated table. The data source connection string %s is not in the correct format
                            
                                SELECT MAX DATE for each ID
                            
                                emulated prepared statements vs real prepared statements
                            
                                Turn off scientific notation MySQL

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With