I'm a lone developer for a telecoms company, and am after some database design advice from anyone with a bit of time to answer.
I am inserting into one table ~2 million rows each day, these tables then get archived and compressed on a monthly basis. Each monthly table contains ~15,000,000 rows. Although this is increasing month on month.
For every insert I do above I am combining the data from rows which belong together and creating another "correlated" table. This table is currently not being archived, as I need to make sure I never miss an update to the correlated table. (Hope that makes sense) Although in general this information should remain fairly static after a couple of days of processing.
All of the above is working perfectly. However my company now wishes to perform some stats against this data, and these tables are getting too large to provide the results in what would be deemed a reasonable time. Even with the appropriate indexes set.
So I guess after all the above my question is quite simple. Should I write a script which groups the data from my correlated table into smaller tables. Or should I store the queries result sets in something like memcache? I'm already using mysqls cache, but due to having limited control over how long the data is stored for, it's not working ideally.
The main advantages I can see of using something like memcache:
The main disadvantages I can see of using something like memcache:
The main advantages of using MySql
The main disadvantages of using MySql
Apologies for quite a long question. It's helped me to write down these thoughts here anyway, and any advice/help/experience with dealing with this sort of problem would be greatly appreciated.
Many thanks.
Alan
(Another answer from me, different enough that I'll post it separately)
Two questions:
What sort of stats does your company want to generate?
and
After rows are inserted into the database, are they ever changed?
If data doesn't change after insert, then you may be able to build up a separate 'stats' table, that you amend/update as new rows are inserted, or maybe soon after new rows are inserted.
e.g. things like:
Its hard to be any more specific without knowing the details, but depending on the stats you're after, these kind of approaches may help.
If you want to do some analysis of static data from a few days back, you should perhaps consider using something like a OLAP system.
Basicly, this type of system stock intermediate stats in their format to do quick sum(), avg(), count()... on large table.
I think your question is a perfect example of the situation where it's used, but perhaps i think so just because it's my job. =)
Take a look.
I work in a company with similar situation, with millions of inserts monthly.
We adopted the strategy of summarize the data in smaller tables, grouped by certain fields.
In our case, when an insert is performed, it triggers a function which classifies the inserted tuple and increment the summary tables.
From time to time, we move the oldest rows to a backup table, reducing the growth of the main table.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With