Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Query performance optimization for dynamically joined columns

Current situation in SQL Server database

There is a table Entry with the following columns:

  • EntryID (int)
  • EntryName (nvarchar)
  • EntrySize (int)
  • EntryDate (datetime)

Further there should be the possibility to save additional metadata for an Entry. Names and values of these metadata should be free to choose and there should be the possibility to dynamically add those without changing the table structure of the database. Each metadata key can be one of the following data types:

  • Text
  • Numeric value
  • DateTime
  • Boolean value (True/False)

Thus there is a table DataKey to represent the metadata names and datatypes with the following columns:

  • DataKeyID (int)
  • DataKeyName (nvarchar)
  • DataKeyType (smallint) 0: Text; 1: Numeric; 2: DateTime; 3: Bit

In table DataValue for each combination of Entry and DataKey values can be inserted depending on the data type of the metadata key. For each data type there is one nullable value column. This table has the following columns:

  • DataValueID (int)
  • EntryID (int) Foreign-Key
  • DataKeyID (int) Foreign-Key
  • TextValue (nvarchar) Nullable
  • NumericValue (float) Nullable
  • DateValue (datetime) Nullable
  • BoolValue (bit) Nullable

Image of the database structure:

enter image description here

TARGET

Target is to retrieve a list of entries fulfilling the specifications like in a WHERE clause. Like the following example:

Assumption:

  • Meta data key KeyName1 is text
  • Meta data key KeyName2 is DateTime
  • Meta data key KeyName3 is numeric
  • Meta data key KeyName4 is Boolean

Query:

... WHERE (KeyName1 = „Test12345“ AND KeyName2 BETWEEN ’01.09.2012 00:00:00’ AND
’01.04.2013 23:59:00’) OR (KeyName3 > 15.3 AND KeyName4 = True)

Target is to do these queries in a very efficient way, also with a large amount of data like

  • Number of entries > 2.000.000
  • Number of data keys between 50 und 100 or maybe > 100
  • Per entry at least a subset of values specified or maybe also a value for each key (2.000.000 * 100)

PROBLEM

The first problem arises when building the query. Normally queries require to have sets with columns that can be used in the WHERE clause. In this case the columns used in the queries are entries in table DataKey as well to be able to dynamically add metadata without having to change the database table structure. During research a solution has been found using PIVOT table techniques at runtime. But it turned out that this solution is very slow when there is a large set of data in the database.

QUESTIONS

  • Is there a more efficient way or structure to save the data for this purpose?
  • How can the requirements listed above be fulfilled, also with regard to performance and time consumption when querying?

Here is a sql fiddle with the discribed database structure and some sample data: http://www.sqlfiddle.com/#!3/d1912/3

like image 672
Rob Avatar asked Sep 05 '13 07:09

Rob


People also ask

What are the different techniques used for query optimization?

Because SQL is a nonprocedural language, the optimizer can merge, restructure, and process data in any sequence. Query Optimization in DBMS has a significant application in database designing. There are two types of query optimization in DBMS: Cost-Based Optimization and Adaptive Query Optimization.


1 Answers

One of the fundamental flaws in an Entity Attribute Value design (which is what you have here) is the difficulty of efficient and performant querying.

The more efficient structure for storing data is to abandon EAV and use a normalised relational form. But that will necessarily involve changing the structure of the database when the data structures change (which should be self evident).

You could abandon your TextValue/NumericValue/DateValue/BoolValue fields and replace them with a single sql_variant column, which would reduce your query complexity slightly, but the fundamental problem will remain.

As a side note, storing all numerics as floats will cause problems if you ever have to deal with money.

like image 69
podiluska Avatar answered Oct 01 '22 15:10

podiluska