Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Which is more efficient: One long Single Table or Distributed Table? and Why?

This question is all about performance and I would appreciate if the answers are specific to the case I provide.

Which is more appropriate performance-wise?

  • creating a table with too many fields
  • creating more than one table and distributing similar fields to them

CASE: An Extensive Web CMS Module

Pattern 1: Long but one table

cms
-----------------------------------------------
Id
Title
Description
Images
Order
Status
Publish
meta_keywords
meta_description
meta_author

Cleary, most the Open Source CMS like joomla use the above pattern. But i think, that pattern is killing the spirit of RDBMS. We can easily separate the content, configuration and meta of a particular article to different tables. Like the following

Pattern 2: Many but related table

Cms_content         cms_meta        cms_configuration
---------------------------------------------------------------------------
Id                  id              id          
Title               content_id      content_id
Description         keywords        status
Content             description     order
Images              author          publish

Note: Relations in this case is one-to-one

Which is the proper pattern to follow? Why choose a long but one table, or why not to choose distributed tables, over the single table?

like image 667
Starx Avatar asked Dec 27 '11 10:12

Starx


2 Answers

The only possible plausible causes for having denormalized data (one table with many columns) I can think of, are:

  • laziness in writing SQL JOINs
  • possible performance improvements on read statements

I like to go for the normalised version all the time, because:

  • I can be sure of data integrity
  • I can extract easily information from the DB (for example, how many posts have some meta, how many distinct metas there are, etc)
like image 169
Tudor Constantin Avatar answered Oct 12 '22 15:10

Tudor Constantin


I think the key of performance on 'modern' - I don't know much about the meaning of 'modern', but - RDBMS based application not only depends on database schema.

  • Database settings : memory usage strategy, key buffer size, query cache size, etc.
  • Distribution on data/processing : partitioning, grid processing.
  • Cache strategy : using embedded cache engine or other( like memcached ).
  • Hardware performance

So, estimating performance is not a simple problem. Even a table with 100 fields can be fitted in memory, but also even two-fields-table may cannot be. A query for 5M rows can be done under one minute, but sometime same query does not end for 10 mins on 10M rows (only twice!) - it depends on environment that I mentioned above.

Thus, I think we cannot choose the best practice for entire cases. For your example, the key is dangled on DBA's taste. (not joke)

like image 30
lqez Avatar answered Oct 12 '22 16:10

lqez