Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

are there disadvantages of sharding a tracking table by date?

We have a mysql tracking database which has ~100M rows. We generally run queries group by day for certain action, unique visits, etc. The problem is that queries are getting slow to generate the monthly report because of the interplay of the indexes (we have a range scan for the date then queries based on multiple other fields.)

To improve performance, we've switched to doing unions based on the date to avoid the range scan and the performance is much better. So the idea was presented that perhaps we should just shard by day with a different table for each day. The advantages seem to be:

  • fast inserts - each day the table is new/small so it's always fast.
  • deleting old data is simple (instead of deleting 5M rows from a 100M row table, we can just drop a table)
  • Our current approach is doing unions anyway - so we're just unioning different tables instead of different values from one table.

Has anyone heard of or tried this approach? Are there any foreseeable problems?

Note: We are considering other noSQL approaches - but we would like to know if this is a valid approach if we decide to stay with MySQL (so please don't suggest "try XYZ noSQL DB"). Also, I understand we can just get a much better machine and that in the scope of data sets, this is not that big - but we don't want to waste money on a larger machine if a smaller will work without a lot of extra work.

like image 902
Yehosef Avatar asked Sep 07 '14 07:09

Yehosef


People also ask

What are disadvantages of sharding?

Disadvantages of sharding This introduces additional latency on every operation. Furthermore, if the data required for the query is horizontally partitioned across multiple shards, the router must then query each shard and merge the result together.

When should you shard your database?

Sharding is necessary if a dataset is too large to be stored in a single database. Moreover, many sharding strategies allow additional machines to be added. Sharding allows a database cluster to scale along with its data and traffic growth. Sharding is also referred as horizontal partitioning.

What is the difference between partitioning and sharding?

Sharding and partitioning are both about breaking up a large data set into smaller subsets. The difference is that sharding implies the data is spread across multiple computers while partitioning does not. Partitioning is about grouping subsets of data within a single database instance.

Is sharding horizontal scaling?

Horizontal scaling (aka sharding) is when you actually split your data into smaller, independent buckets and keep adding new buckets as needed. A sharded environment has two or more groups of MySQL servers that are isolated and independent from each other.


1 Answers

It seems you may take a look at MySQL Partitioning.

Partitioning enable you to distribute portions of individual tables across a file system according to rules which you can set largely as needed. In effect, different portions of a table are stored as separate tables in different locations. The user-selected rule by which the division of data is accomplished is known as a partitioning function, which in MySQL can be the modulus, simple matching against a set of ranges or value lists, an internal hashing function, or a linear hashing function. The function is selected according to the partitioning type specified by the user, and takes as its parameter the value of a user-supplied expression. This expression can be a column value, a function acting on one or more column values, or a set of one or more column values, depending on the type of partitioning that is used.
It seems a in your case a hash partition on month part of the date could be useful.

CREATE TABLE Mydata (id INT, amount DECIMAL(7,2), tr_date DATE)
    ENGINE=INNODB
    PARTITION BY HASH( MONTH(tr_date) )
    PARTITIONS 12;

The partitioning solution will be conceptually the same as yours, but RDBMS will handle many aspects for you.

like image 52
Mohsen Heydari Avatar answered Sep 28 '22 02:09

Mohsen Heydari