How can I calculate database design storage costs?

Tags:

I often have a couple different schema's in mind when starting project. After making rough guesses I realize that some are less optimized for growth or storage space than others. Obviously, the size of the column value is the main thing. But table metadata, indexes, and row headers all play a part as well.

In addition, RDBMS use a completely different approach to data storage than object or key-value databases.

What are some good resources for trying to figure out the cost (or room needed) for database storage?

Note, my question has little to do with choosing the database, but rather knowing how to properly make use of each database's design for the most efficiently. Databases like PostgreSQL, MySQL, CouchDB, all have different target use cases and multiple ways to solve the same problem. So knowing the storage cost of each solution will help add to the choice of the best solution for the schema.

289

asked Feb 23 '12 21:02

Xeoncross

2 Answers

RDBMS use a completely different approach to data storage than object or key-value databases.

The relational model assumes you don't know what data will be needed in the future, or how data will be accessed in the future. This has proven to be a pretty reliable assumption in my experience.

That's one reason a SQL dbms will let you add indexes as they're needed, and let you drop indexes that have proven useless. It will let you add constraints as they become known--constraints that sometimes require adding more tables--and drop constraints as the requirements change. It will let you add columns as you discover more things that would be good to know. It will let you replace tables with views and replace views with tables. Some dbms will let you create materialized views--their impact on query speed can be dramatic, and their impact on disk usage, devastating.

Useful databases extend their reach. A SQL database, designed according to the relational model, makes it relatively easy to add features nobody dreamed of during the initial design, and without crushing other parts of the system. So they're called often called upon to do things their initial designers didn't imagine.

All of these things

adding and dropping indexes over time,
adding and dropping constraints over time,
adding and dropping columns over time,
adding and dropping tables over time,

make any estimate of disk usage look like a waste of time. Any one of them alone can drastically change the disk space required for a database.

You can calculate the space required by a row and a page fairly accurately. (Try Google for "YourDBMSname row layout" and "YourDBMSname page layout".) But when you try to multiply by the number of rows required you have to estimate the number of rows. That puts you at the big end of what Steve McConnell calls "the cone of uncertainty".

If you haven't measured disk usage in multiple projects over time at your own company, estimating the impact of those bullet points above is just guessing.

The last Fortune 100 company I worked for had an operational database that had been in production since the 1970s. Hundreds of applications, written in more than 25 programming languages over the course of 40 years hit that thing every day. (I think it was built on IBM's IMS originally; today it runs on Oracle.)

Even just a few years ago, nobody there imagined that their database would be used to translate engineering drawings and bills of materials into Chinese, and also to produce the customs documents they'd need to get finished products out of China. Implementing those new features required storing additional data about every part and about every design document in their live inventory. Early in that project, our estimates were pretty far off. That's the big end of the cone. (We estimated several things, but not disk usage. We were required to succeed, so whatever design I came up with, somebody would be required to supply the needed disk space.) But when we went live, we knew the exact value for every estimate, because we'd already done the work. (That's the narrow end of the cone.)

So, how do you mitigate the risk of guesswork in a database design and deployment environment? Take a lesson from 1972.

Build a prototype, and measure it.

Chemical engineers learned long ago that a process that works in the laboratory cannot be implemented in a factory in only one step. An intermediate step called the pilot plant is necessary to give experience in scaling quantities up and in operating in nonprotective environments. . . .

. . . Project after project designs a set of algorithms and then plunges into construction of customer-deliverable software on a schedule that demands delivery of the first thing built. . . .

The management question, therefore, is not whether to build a pilot system and throw it away. You will do that. The only question is whether to plan in advance to build a throwaway, or to promise to deliver the throwaway to customers.

Fred Brooks, Jr., in The Mythical Man-Month, p 116.

answered Sep 26 '22 13:09

Mike Sherrill 'Cat Recall'

Here's an AskTom article I found helpful. It is Oracle-specific though.

http://asktom.oracle.com/pls/apex/f?p=100:11:0::::P11_QUESTION_ID:266215435203

answered Sep 28 '22 13:09

Jim

Related questions
                            
                                What are the pros/cons of and best practices for using a single database?
                            
                                Database design for write-heavy web application
                            
                                User, customer, admin account in 3 different tables?
                            
                                Best Way To Store Multiple Flags In Database
                            
                                Fast Relational method of storing tree data (for instance threaded comments on articles)
                            
                                Mysql Datatype for US Zip (Postal Codes)
                            
                                "date" as a column name
                            
                                Historical / auditable database
                            
                                How to structure the tables of a very simple blog in MySQL? [closed]
                            
                                SQLite composite key (2 foreign keys) Link table
                            
                                Relational database design question - Surrogate-key or Natural-key?
                            
                                How to understand the 5th Normal Form?
                            
                                Users asking for denormalized database
                            
                                Publish / Subscribe pattern in SQL
                            
                                What is the best practice to store a "saved search" in a database
                            
                                How can graph databases scale horizontally, if at all?
                            
                                Why is the foreign key part of the primary key in an identifying relationship?
                            
                                Structure within staging area of data warehouse
                            
                                Alternatives to traditional relational databases for activity streams
                            
                                Delivering activity feed items in a moderately scalable way

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How can I calculate database design storage costs?

Tags:

database-design

nosql

rdbms

capacity-planning

Xeoncross

People also ask

2 Answers

Mike Sherrill 'Cat Recall'

Jim

Recent Activity

Donate For Us