Best to have hundreds of columns or split into multiple tables?

Tags:

database-design

I'm designing a database of statistics about the operation of mechanical equipment. Each batch of data will contain hundreds of statistics, so I'm trying to decide whether to create a single table with hundreds of columns or to split it up into multiple tables, each containing related statistics. For example, I could have one table containing the statistics related to malfunctions, another table with the statistics related to jams, etc.

Using multiple tables would make the system more complex overall, though conceptually it might be easier for me to deal with several smaller tables than one big one.

Would there be any performance benefits to splitting things up? It seems like querying a table with a few dozen columns would probably be faster than querying one with hundreds of columns.

Does anyone have any experience with this sort of thing? I'm using Oracle for this project, although this is something I'll likely run into with out databases in the future, so answers for any database would be appreciated.

645

asked Jan 09 '09 14:01

Eli Courtwright

4 Answers

I think we need to know more about your design to answer properly. For example, I'm curious that there could be lots of columns relating to malfunctions, lots (of different ones) relating to jams etc. (Isn't a jam just a kind of malfunction anyway?)

Is your design normalised? Presumably you don't have columns like "jam1", "jam2", etc.?!

Assuming the design is good and normalised, the decision as to whether to have one wide table or many narrower ones is a trade-off between various factors:

Do all/most records have statistics of all types? Yes => one table, no => many
Do you often need to query statistics of all types together? Yes => one table, no => many
Do you maintain all the different stats together in the same screen? Yes => one table, no => many
Are you likely to hit any database limits e.g. max 1000 columns per table?

Whichever way you go, you can use views to present the alternative structure for the convenience of the developer:

One table: many views that select stats of particular types
Many tables: a view that joins all the tables together

Update

From your comments, I now know that you have counts of jams at 40 different locations on the machine, and other types of stats are counts of a similar nature. This suggests the following table design:

create table machines (machine_id ... primary key, ...);
create table machine_stats 
   ( machine_id references machines
   , stat_group -- 'jams', 'malfunctions' etc.
   , stat_name  -- 'under the hood', 'behind the door' etc.
   , stat_count 
   );

As someone commented below, these allows you to sum stats more easily - within or across stat types. It is also easily extended if a new stat needs to be added to a stat type.

133

answered Oct 19 '22 19:10

Tony Andrews

When I see hundreds of columns in a table, i tend to suspect the data schema hasn't been properly normalised. Are the hundreds of columns really unique, or are they groups of similar things that can be normalised out into smaller tables?

If you can reduce the number of columns, you are liable to reduce the total amount of data transacted and hence improve performance at a number of levels. For example, if you have a record that contains 1000 bytes of data, and you want to change 1 byte for each record, you risk fetching and storing 999 bytes unnecessarily. This does impact performance.

answered Oct 19 '22 19:10

SmacL

Normalization ensures that you don't repeat data in your schema.

There are limits to how far you should go, of course. JOINS for 7 tables or more are not performant.

But one monster table? I'd break it up.

answered Oct 19 '22 20:10

duffymo

Do you mean 100s of types of statistics?

Some medical databases have tried a schema or idiom which is called "entity attribute value" or "EAV" (you can Google these terms): the reasoning is that there are innumerable different types of fact about a patient, which may or may not have been captured for any given patient, and that EAV is a better way to store this than to have innumerable different columns in a table.

Beware however that EAV is controversial: some say it's a "code smell" and typical newbie mistake; others say that it's useful occasionally (or rarely) but depends on (specifying and having) good support for metadata.

answered Oct 19 '22 19:10

ChrisW

Related questions
                            
                                PHP Access Control System
                            
                                Data localization in SQL Server with GUIDs or...?
                            
                                MongoDB: storage & when to use relationships
                            
                                Design Hours of Operation SQL Table
                            
                                How to handle database purging in Mongodb
                            
                                Database Historization
                            
                                App Engine: Structured Property vs Reference Property for one-to-many relationship
                            
                                SQL database - when to use a separate table vs a column of an existing one?
                            
                                What's the best way to handle I18N in lookup tables?
                            
                                How to retrieve grandchild objects from a parent using linq
                            
                                Best approach to Architect the integration of two separate databases?
                            
                                Difference between storing integer or string in database table
                            
                                How can I assign a number to each row in a table representing the record number?
                            
                                MySQL primary key column type for large tables
                            
                                A Beginner Question on database design
                            
                                Is this a "correct" database design?
                            
                                How to merge user data after login?
                            
                                Rails: dynamic columns/attributes on models?
                            
                                jsonb and primary/foreign keys: which performs better in PostgreSQL?
                            
                                Two foreign keys, one of them not NULL: How to solve this in SQL?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Best to have hundreds of columns or split into multiple tables?

Tags:

database-design

Eli Courtwright

People also ask

4 Answers

Update

Tony Andrews

SmacL

duffymo

ChrisW

Recent Activity

Donate For Us