What techniques are most effective for dealing with millions of records?

Tags:

I once had a MySQL database table containing 25 million records, which made even a simple COUNT(*) query takes minute to execute. I ended up making partitions, separating them into a couple tables. What i'm asking is, is there any pattern or design techniques to handle this kind of problem (huge number of records)? Is MSSQL or Oracle better in handling lots of records?

P.S the COUNT(*) problem stated above is just an example case, in reality the app does crud functionality and some aggregate query (for reporting), but nothing really complicated. It's just that it takes quite a while (minutes) to execute some these queries because of the table volume

516

asked Oct 08 '08 08:10

lonegunman

2 Answers

See Why MySQL could be slow with large tables and COUNT(*) vs COUNT(col)

Make sure you have an index on the column you're counting. If your server has plenty of RAM, consider increasing MySQL's buffer size. Make sure your disks are configured correctly -- DMA enabled, not sharing a drive or cable with the swap partition, etc.

answered Sep 23 '22 09:09

John Millikin

What you're asking with "SELECT COUNT(*)" is not easy.

In MySQL, the MyISAM non-transactional engine optimises this by keeping a record count, so SELECT COUNT(*) will be very quick.

However, if you're using a transactional engine, SELECT COUNT(*) is basically saying:

Exactly how many records exist in this table in my transaction ?

To do this, the engine needs to scan the entire table; it probably knows roughly how many records exist in the table already, but to get an exact answer for a particular transaction, it needs a scan. This isn't going to be fast using MySQL innodb, it's not going to be fast in Oracle, or anything else. The whole table MUST be read (excluding things stored separately by the engine, such as BLOBs)

Having the whole table in ram will make it a bit faster, but it's still not going to be fast.

If your application relies on frequent, accurate counts, you may want to make a summary table which is updated by a trigger or some other means.

If your application relies on frequent, less accurate counts, you could maintain summary data with a scheduled task (which may impact performance of other operations less).

answered Sep 23 '22 09:09

MarkR

Related questions
                            
                                MSBuild -- Use the .csproj file or roll your own?
                            
                                Can I make an XMLHttpRequest to another domain?
                            
                                How do I create a DSN for ODBC in Linux?
                            
                                What does "Generate Debug Info" mean in VB/C#?
                            
                                MSTest.exe not finding app.config
                            
                                How to set the character encoding in a yaml file
                            
                                C++ Memory Management for Texture Streaming in Videogames
                            
                                LINQ Performance for Large Collections
                            
                                Truncate file at front
                            
                                Exactly when does an IFRAME onload event fire?
                            
                                Java Reflection getDeclaredMethod() with Class Types
                            
                                Firefox error 'no element found'

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With