Storing large amounts of data in a database

Tags:

I'm currently working on a home-automation project which provides the user with the possibility to view their energy usage over a period of time. Currently we request data every 15 minutes and we are expecting around 2000 users for our first big pilot.

My boss is requesting we that we store at least half a year of data. A quick sum leads to estimates of around 35 million records. Though these records are small (around 500bytes each) I'm still wondering whether storing these in our database (Postgres) is a correct decision.

Does anyone have some good reference material and/or advise about how to deal with this amount of information?

299

asked Jul 20 '11 10:07

Exelian

2 Answers

For now, 35M records of 0.5K each means 37.5G of data. This fits in a database for your pilot, but you should also think of the next step after the pilot. Your boss will not be happy when the pilot will be a big success and that you will tell him that you cannot add 100.000 users to the system in the next months without redesigning everything. Moreover, what about a new feature for VIP users to request data at each minutes...

This is a complex issue and the choice you make will restrict the evolution of your software.

For the pilot, keep it as simple as possible to get the product out as cheap as possible --> ok for a database. But tell you boss that you cannot open the service like that and that you will have to change things before getting 10.000 new users per week.

One thing for the next release: have many data repositories: one for your user data that is updated frequently, one for you queries/statistics system, ...

You could look at RRD for your next release.

Also keep in mind the update frequency: 2000 users updating data each 15 minutes means 2.2 updates per seconds --> ok; 100.000 users updating data each 5 minutes means 333.3 updates per seconds. I am not sure a simple database can keep up with that, and a single web service server definitely cannot.

198

answered Oct 06 '22 15:10

jfg956

We frequently hit tables that look like this. Obviously structure your indexes based on usage (do you read or write a lot, etc), and from the start think about table partitioning based on some high level grouping of the data.

Also, you can implement an archiving idea to keep the live table thin. Historical records are either never touched, or reported on, both of which are no good to live tables in my opinion.

It's worth noting that we have tables around 100m records and we don't perceive there to be a performance problem. A lot of these performance improvements can be made with little pain afterwards, so you could always start with a common-sense solution and tune only when performance is proven to be poor.

answered Oct 06 '22 16:10

Adam Houldsworth

Related questions
                            
                                Django migrations--is it possible to use South in the middle of the project?
                            
                                How can I check a type's dependents order to drop them and replace/modify the initial type?
                            
                                In django, how can I include some default records in my models.py?
                            
                                MySQL best usage in Tomcat?
                            
                                large databases in sqlite - file size considerations?
                            
                                SQL Server architecture guidance
                            
                                What is the best practice when saving passwords using the C# Settings feature?
                            
                                How to bind engine when I want, when using declarative_base in SQLAlchemy?
                            
                                Storing 'invalid' date in a database
                            
                                postgresql use bytea blob or file location to store serialized object?
                            
                                manipulating 15+ million records in mysql with php?
                            
                                PHP simple text database with SQL syntax [closed]
                            
                                Large Sample Database with Latitudes and Longitudes
                            
                                MySQL: how to do row-level security (like Oracle's Virtual Private Database)?
                            
                                Storing user profile data in the users table or separate profile table?
                            
                                Suggest a simple ORM on .NET - design for maintaining legacy apps
                            
                                Intermittent SQL Exception - network-related or instance-specific error
                            
                                .CSV to SQL CE Table?
                            
                                Problem with auto-incremented "id" column
                            
                                Using BinaryWriter on an Object

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Storing large amounts of data in a database

Tags:

database

postgresql

home-automation

Exelian

People also ask

2 Answers

jfg956

Adam Houldsworth

Recent Activity

Donate For Us