Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Database design for heavy timed data logging

Tags:

I have an application where I receive each data 40.000 rows. I have 5 million rows to handle (500 Mb MySQL 5.0 database).

Actually, those rows are stored in the same table => slow to update, hard to backup, etc.

Which kind of scheme is used in such application to allow long term accessibility to the data without problems with too big tables, easy backup, fast read/write ?

Is postgresql better than mysql for such purpose ?

like image 438
hotips Avatar asked Mar 15 '10 13:03

hotips


People also ask

Which database is best for storing logs?

Apache HBase is the Hadoop database. It is also a distributed, scalable, big data store. It's very convenient to store logs in and it has real-time read/write access to your data.

Which database is best for large data?

MongoDB is suitable for hierarchical data storage and is almost 100 times faster than Relational Database Management System (RDBMS).

What are the 3 database design steps?

The methodology is depicted as a bit by bit guide to the three main phases of database design, namely: conceptual, logical, and physical design.

What is the best practice for database design?

The database belongs to its future users, not its creator, so design with them in mind. Stay away from shortcuts, abbreviations, or plurals. Use consistent naming conventions. Don't reinvent the wheel or make things difficult for those who may need to modify the database at some point, which will certainly happen.


2 Answers

1 - 40000 rows / day is not that big

2 - Partition your data against the insert date : you can easily delete old data this way.

3 - Don't hesitate to go through a datamart step. (compute often asked metrics in intermediary tables)

FYI, I have used PostgreSQL with tables containing several GB of data without any problem (and without partitioning). INSERT/UPDATE time was constant

like image 88
chburd Avatar answered Sep 28 '22 08:09

chburd


We're having log tables of 100-200million rows now, and it is quite painful.

  • backup is impossible, requires several days of down time.

  • purging old data is becoming too painful - it usually ties down the database for several hours

So far we've only seen these solutions:

  • backup , set up a MySQL slave. Backing up the slave doesn't impact the main db. (We havn't done this yet - as the logs we load and transform are from flat files - we back up these files and can regenerate the db in case of failures)

  • Purging old data, only painless way we've found is to introduce a new integer column that identifies the current date, and partition the tables(requires mysql 5.1) on that key, per day. Dropping old data is a matter of dropping a partition, which is fast.

If in addition you need to do continuously transactions on these tables(as opposed to just load data every now and then and mostly query that data), you probably need to look into InnoDB and not the default MyISAM tables.

like image 21
nos Avatar answered Sep 28 '22 09:09

nos