Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What techniques are most effective for dealing with millions of records?

Tags:

I once had a MySQL database table containing 25 million records, which made even a simple COUNT(*) query takes minute to execute. I ended up making partitions, separating them into a couple tables. What i'm asking is, is there any pattern or design techniques to handle this kind of problem (huge number of records)? Is MSSQL or Oracle better in handling lots of records?

P.S the COUNT(*) problem stated above is just an example case, in reality the app does crud functionality and some aggregate query (for reporting), but nothing really complicated. It's just that it takes quite a while (minutes) to execute some these queries because of the table volume

like image 516
lonegunman Avatar asked Oct 08 '08 08:10

lonegunman


People also ask

How would you store millions of records in a table?

Another possibility is to create a time series in column oriented database like HBase or Cassandra. In this case you'd have one row per product and as many columns as hits. Last, if you are going to do it with the database, as @JosMac pointed, create partitions, avoid indexes as much as you can.

How do you handle a large amount of data in a database?

When your database is large consider having a DDL (Data Definition Language) for your database table in MySQL/MariaDB. Adding a primary or unique key for your table requires a table rebuild. Changing a column data type also requires a table rebuild as the algorithm applicable to be applied is only ALGORITHM=COPY.


2 Answers

See Why MySQL could be slow with large tables and COUNT(*) vs COUNT(col)

Make sure you have an index on the column you're counting. If your server has plenty of RAM, consider increasing MySQL's buffer size. Make sure your disks are configured correctly -- DMA enabled, not sharing a drive or cable with the swap partition, etc.

like image 86
John Millikin Avatar answered Sep 23 '22 09:09

John Millikin


What you're asking with "SELECT COUNT(*)" is not easy.

In MySQL, the MyISAM non-transactional engine optimises this by keeping a record count, so SELECT COUNT(*) will be very quick.

However, if you're using a transactional engine, SELECT COUNT(*) is basically saying:

Exactly how many records exist in this table in my transaction ?

To do this, the engine needs to scan the entire table; it probably knows roughly how many records exist in the table already, but to get an exact answer for a particular transaction, it needs a scan. This isn't going to be fast using MySQL innodb, it's not going to be fast in Oracle, or anything else. The whole table MUST be read (excluding things stored separately by the engine, such as BLOBs)

Having the whole table in ram will make it a bit faster, but it's still not going to be fast.

If your application relies on frequent, accurate counts, you may want to make a summary table which is updated by a trigger or some other means.

If your application relies on frequent, less accurate counts, you could maintain summary data with a scheduled task (which may impact performance of other operations less).

like image 36
MarkR Avatar answered Sep 23 '22 09:09

MarkR