Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

"horizontal" vs. "vertical" table design, SQL

Apologies if this has been covered thoroughly in the past - I've seen some related posts but haven't found anything that satisfies me with regards to this specific scenario.

I've been recently looking over a relatively simple game with around 10k players. In the game you can catch and breed pets that have certain attributes (i.e. wings, horns, manes). There's currently a table in the database that looks something like this:

-------------------------------------------------------------------------------
| pet_id | wings1 | wings1_hex | wings2 | wings2_hex | horns1 | horns1_hex | ...
-------------------------------------------------------------------------------
|      1 |      1 |     ffffff |   NULL |       NULL |      2 |     000000 | ...
|      2 |   NULL |       NULL |   NULL |       NULL |   NULL |       NULL | ...
|      3 |      2 |     ff0000 |      1 |     ffffff |      3 |     00ff00 | ...
|      4 |   NULL |       NULL |   NULL |       NULL |      1 |     0000ff | ...
etc...

The table goes on like that and currently has 100+ columns, but in general a single pet will only have around 1-8 of these attributes. A new attribute is added every 1-2 months which requires table columns to be added. The table is rarely updated and read frequently.

I've been proposing that we move to a more vertical design scheme for better flexibility as we want to start adding larger volumes of attributes in the future, i.e.:

----------------------------------------------------------------
| pet_id | attribute_id | attribute_color | attribute_position |
----------------------------------------------------------------
|      1 |            1 |          ffffff |                  1 |  
|      1 |            3 |          000000 |                  2 |  
|      3 |            2 |          ffffff |                  1 |  
|      3 |            1 |          ff0000 |                  2 |  
|      3 |            3 |          00ff00 |                  3 |  
|      4 |            3 |          0000ff |                  1 | 
etc...

The old developer has raised concerns that this will create performance issues as users very frequently search for pets with specific attributes (i.e. must have these attributes, must have at least one in this colour or position, must have > 30 attributes). Currently the search is quite fast as there are no JOINS required, but introducing a vertical table would presumably mean an additional join for every attribute searched and would also triple the number of rows or so.

The first part of my question is if anyone has any recommendations with regards to this? I'm not particularly experienced with database design or optimisation.

I've run tests for a variety of cases but they've been largely inconclusive - the times vary quite significantly for all of the queries that I ran (i.e. between half a second and 20+ seconds), so I suppose the second part of my question is whether there's a more reliable way of profiling query times than using microtime(true) in PHP.

Thanks.

like image 934
fluke Avatar asked Feb 13 '12 07:02

fluke


People also ask

What is the difference between the horizontal data format and the vertical data format?

The horizontal format is more compact, and is easier to read so long as the amount of syntactic information interspersed with the words is not too dense. The vertical format is more convenient and more readable if there is too much syntactic information to be conveniently shown in the horizontal format.

What is vertical table in SQL?

In a vertical table, each row contains one value for one data field for a unique key. The following table contains the same data as the relational table, but is organized vertically. Key Value.

How do I make horizontal data vertical in SQL?

The purpose of the PIVOT query is to rotate the output and display vertical data horizontally. These queries are also known as crosstab queries. The SQL Server PIVOT operator can be used to easily rotate/transpose your data. This is a very nice tool if the data values that you wish to rotate are not likely to change.

What is a vertical table database?

This is a specific way of storing data vertically in a table that only has two real data columns and one other (but there could me more) identifier column. The data is stored as key/value pairs vertically; hence, the name.


1 Answers

This is called the Entity-Attribute-Value-Model, and relational database systems are really not suited for it at all.

To quote someone who deems it one of the five errors not to make:

So what are the benefits that are touted for EAV? Well, there are none. Since EAV tables will contain any kind of data, we have to PIVOT the data to a tabular representation, with appropriate columns, in order to make it useful. In many cases, there is middleware or client-side software that does this behind the scenes, thereby providing the illusion to the user that they are dealing with well-designed data.

EAV models have a host of problems.

Firstly, the massive amount of data is, in itself, essentially unmanageable.

Secondly, there is no possible way to define the necessary constraints -- any potential check constraints will have to include extensive hard-coding for appropriate attribute names. Since a single column holds all possible values, the datatype is usually VARCHAR(n).

Thirdly, don't even think about having any useful foreign keys.

Finally, there is the complexity and awkwardness of queries. Some folks consider it a benefit to be able to jam a variety of data into a single table when necessary -- they call it "scalable". In reality, since EAV mixes up data with metadata, it is lot more difficult to manipulate data even for simple requirements.

The solution to the EAV nightmare is simple: Analyze and research the users' needs and identify the data requirements up-front. A relational database maintains the integrity and consistency of data. It is virtually impossible to make a case for designing such a database without well-defined requirements. Period.


The table goes on like that and currently has 100+ columns, but in general a single pet will only have around 1-8 of these attributes.

That looks like a case for normalization: Break the table into multiple, for example one for horns, one for wings, all connected by foreign key to the main entity table. But do make sure that every attribute still maps to one or more columns, so that you can define constraints, data types, indexes, and so on.

like image 113
Thilo Avatar answered Oct 22 '22 23:10

Thilo