Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Column oriented database vs row oriented database

I have used row oriented database design for long time and except for datawarehouse projects and Big data samples, I have not used column oriented database design for OLTP app.

My row oriented table looks like

ID, Make, Model, Month, Miles, Cost
1   BMW   Z3     12     12000  100

Some people in our team advocating column oriented database design. They suggest that all the column names should be property names in a Property table. Then another table Quote will have two columns PropertyName and PropertyValue.

In the .net code, we read each key and compare and convert to strongly typed object. The code is really getting messy.

if (qwi.DomainCode == typeof(CoreBO.Base.iQQConstants.MBPCollateralInfo).Name)
     {
        if (qwi.RefCode == iQQConstants.MBPCollateralInfo.ENGINETYPE)
        {
           Aspiration = qwi.Value;
        }
        else if (qwi.RefCode == iQQConstants.MBPCollateralInfo.FUELTYPE)
        {
           FuelType = qwi.Value;
        }
        else if (qwi.RefCode == iQQConstants.MBPCollateralInfo.MAKE)
        {
           Make = qwi.Value;
        }
        else if (qwi.RefCode == iQQConstants.MBPCollateralInfo.MILEAGE)
        {
           int reading = 0;
           bool success = int.TryParse(qwi.Value, out reading);
           if (success)
           {
              OdometerReading = reading;
           }
}
}

The arguement for this column oriented design is that we won't have to change table schema and the stored proc(we are still using stored proc instead of Entity Framework).

Seems like we are heading into real problem. Is Column oriented design well accepted in the industry.

like image 762
Ron Avatar asked Sep 16 '13 15:09

Ron


2 Answers

I am having trouble with your terminology. You are describing an EAV structure (standing for Entity-Attribute-Value).

Aside: A "column-oriented" database usually refers to a database that stores each column separately from others (when I learned about databases, this was called "vertical partitioning", but I don't think that caught on). Examples include Paracel and Vertica.

An entity-attribute-value database is storing each attribute for an entity as a separate row.

The first problem that you have with your particular structure is typing. Some of the attributes are strings and some are numbers. This becomes a management nightmare in an EAV world. Either you store everything as strings (losing the ability to type check values and to guarantee that arithmetic words) or you include multiple columns for different types with a type column (making queries much more complicated).

Similarly, constraints and foreign key references are much harder to implement. Also, because you are repeating the entity id and attribute id on each row, the data often takes up more space. NULL values are typically quite space efficient.

On the OLTP side, you have another problem. When you want to insert an entity, you typically want to insert a bunch of attributes as well. One insert has now turned into many inserts, and you'll want to start wrapping these in transactions, affecting performance.

Given all these shortcomings, you might think never use EAV models. There is a place for them. They are particularly useful when attributes are changing over time. Say, if you have an application where users can put in their own information with tags. In such cases, a hybrid approach is the best solution. Use a regular relational table with many columns for the common information. Use an EAV table for optional information for each entity.

like image 103
Gordon Linoff Avatar answered Sep 28 '22 06:09

Gordon Linoff


Source: WIKI

  1. Column-oriented organizations are more efficient when an aggregate needs to be computed over many rows but only for a notably smaller subset of all columns of data, because reading that smaller subset of data can be faster than reading all data.
  2. Column-oriented organizations are more efficient when new values of a column are supplied for all rows at once, because that column data can be written efficiently and replace old column data without touching any other columns for the rows.
  3. Row-oriented organizations are more efficient when many columns of a single row are required at the same time, and when row-size is relatively small, as the entire row can be retrieved with a single disk seek.
  4. Row-oriented organizations are more efficient when writing a new row if all of the column data is supplied at the same time, as the entire row can be written with a single disk seek.

In practice, row-oriented storage layouts are well-suited for OLTP-like workloads which are more heavily loaded with interactive transactions. Column-oriented storage layouts are well-suited for OLAP-like workloads (e.g., data warehouses) which typically involve a smaller number of highly complex queries over all data (possibly terabytes).

like image 35
SriniV Avatar answered Sep 28 '22 08:09

SriniV