Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL Optimization: how many columns on a table?

In a recent project I have seen a tables from 50 to 126 columns.

Should a table hold less columns per table or is it better to separate them out into a new table and use relationships? What are the pros and cons?

like image 642
Jack Avatar asked Jan 28 '09 19:01

Jack


People also ask

How many columns should a SQL table have?

Answer. For the columns in a table, there is a maximum limit of 1024 columns in a table. SQL Server does have a wide-table feature that allows a table to have up to 30,000 columns instead of 1024.

Does number of columns affect performance in SQL?

There will be no performance difference based on the column position.

How many columns should a database table have?

There is no precise guidance. A table could be as little as one column or as many as the max, 1024. However, in general, you'll probably see no more than 10-15 columns in a table in a well normalized database.

Does number of columns affect performance in MySQL?

Yes, extra data can slow down queries because it means fewer rows can fit into a page, and this means more disk accesses to read a certain number of rows and fewer rows can be cached in memory.


8 Answers

Generally it's better to design your tables first to model the data requirements and to satisfy rules of normalization. Then worry about optimizations like how many pages it takes to store a row, etc.

I agree with other posters here that the large number of columns is a potential red flag that your table is not properly normalized. But it might be fine in this case. We can't tell from your description.

In any case, splitting the table up just because the large number of columns makes you uneasy is not the right remedy. Is this really causing any defects or performance bottleneck? You need to measure to be sure, not suppose.

like image 189
Bill Karwin Avatar answered Oct 08 '22 11:10

Bill Karwin


A good rule of thumb that I've found is simply whether or not a table is growing rows as a project continues,

For instance:

On a project I'm working on, the original designers decided to include site permissions as columns in the user table.

So now, we are constantly adding more columns as new features are implemented on the site. obviously this is not optimal. A better solution would be to have a table containing permissions and a join table between users and permissions to assign them.

However, for other more archival information, or tables that simply don't have to grow or need to be cached/minimize pages/can be filtered effectively, having a large table doesn't hurt too much as long as it doesn't hamper maintenance of the project.

At least that is my opinion.

like image 29
Chris J Avatar answered Oct 08 '22 12:10

Chris J


Usually excess columns points to improper normalization, but it is hard to judge without having some more details about your requirements.

like image 24
Otávio Décio Avatar answered Oct 08 '22 13:10

Otávio Décio


I can picture times when it might be necessary to have this many, or more columns. Examples would be if you had to denormalize and cache data - or for a type of row with many attributes. I think the keys are to avoid select * and make sure you are indexing the right columns and composites.

like image 29
Scott Miller Avatar answered Oct 08 '22 11:10

Scott Miller


If you had an object detailing the data in the database, would you have a single object with 120 fields, or would you be looking through the data to extract data that is logically distinguishable? You can inline Address data with Customer data, but it makes sense to remove it and put it into an Addresses table, even if it keeps a 1:1 mapping with the Person.

Down the line you might need to have a record of their previous address, and by splitting it out you've removed one major problem refactoring your system.

Are any of the fields duplicated over multiple rows? I.e., are the customer's details replicated, one per invoice? In which case there should be one customer entry in the Customers table, and n entries in the Invoices table.

One place where you need to not fix broken normalisation is where you have a facts table (for auditing, etc) where the purpose is to aggregate data to run analyses on. These tables are usually populated from the properly normalised tables however (overnight for example).

like image 26
JeeBee Avatar answered Oct 08 '22 13:10

JeeBee


It sounds like you have potential normalization issues.

If you really want to, you can create a new table for each of those columns (a little extreme) or group of related columns, and join it on the ID of each record.

like image 30
Joe Phillips Avatar answered Oct 08 '22 11:10

Joe Phillips


It could certainly affect performance if people are running around with a lot of "Select * from GiantTableWithManyColumns"...

like image 38
Jason Punyon Avatar answered Oct 08 '22 13:10

Jason Punyon


Here are the official statistics for SQL Server 2005 http://msdn.microsoft.com/en-us/library/ms143432.aspx

Keep in mind these are the maximums, and are not necessarily the best for usability.

Think about splitting the 126 columns into sections. For instance, if it is some sort of "person" table you could have

Person ID, AddressNum, AddressSt, AptNo, Province, Country, PostalCode, Telephone, CellPhone, Fax

But you could separate that into Person ID, AddressID, PhoneID

Address ID, AddressNum, AddressSt, AptNo, Province, Country, PostalCode

Phone ID, Telephone, Cellphone, fax

In the second one, you could also save yourself from data replication by having all the people with the same address have the same addressId instead of copying the same text over and over.

like image 23
DevinB Avatar answered Oct 08 '22 12:10

DevinB