Query performance optimization for dynamically joined columns

Tags:

Current situation in SQL Server database

There is a table Entry with the following columns:

EntryID (int)
EntryName (nvarchar)
EntrySize (int)
EntryDate (datetime)

Further there should be the possibility to save additional metadata for an Entry. Names and values of these metadata should be free to choose and there should be the possibility to dynamically add those without changing the table structure of the database. Each metadata key can be one of the following data types:

Text
Numeric value
DateTime
Boolean value (True/False)

Thus there is a table DataKey to represent the metadata names and datatypes with the following columns:

DataKeyID (int)
DataKeyName (nvarchar)
DataKeyType (smallint) 0: Text; 1: Numeric; 2: DateTime; 3: Bit

In table DataValue for each combination of Entry and DataKey values can be inserted depending on the data type of the metadata key. For each data type there is one nullable value column. This table has the following columns:

DataValueID (int)
EntryID (int) Foreign-Key
DataKeyID (int) Foreign-Key
TextValue (nvarchar) Nullable
NumericValue (float) Nullable
DateValue (datetime) Nullable
BoolValue (bit) Nullable

Image of the database structure:

enter image description here

TARGET

Target is to retrieve a list of entries fulfilling the specifications like in a WHERE clause. Like the following example:

Assumption:

Meta data key KeyName1 is text
Meta data key KeyName2 is DateTime
Meta data key KeyName3 is numeric
Meta data key KeyName4 is Boolean

Query:

... WHERE (KeyName1 = „Test12345“ AND KeyName2 BETWEEN ’01.09.2012 00:00:00’ AND
’01.04.2013 23:59:00’) OR (KeyName3 > 15.3 AND KeyName4 = True)

Target is to do these queries in a very efficient way, also with a large amount of data like

Number of entries > 2.000.000
Number of data keys between 50 und 100 or maybe > 100
Per entry at least a subset of values specified or maybe also a value for each key (2.000.000 * 100)

PROBLEM

The first problem arises when building the query. Normally queries require to have sets with columns that can be used in the WHERE clause. In this case the columns used in the queries are entries in table DataKey as well to be able to dynamically add metadata without having to change the database table structure. During research a solution has been found using PIVOT table techniques at runtime. But it turned out that this solution is very slow when there is a large set of data in the database.

QUESTIONS

Is there a more efficient way or structure to save the data for this purpose?
How can the requirements listed above be fulfilled, also with regard to performance and time consumption when querying?

Here is a sql fiddle with the discribed database structure and some sample data: http://www.sqlfiddle.com/#!3/d1912/3

672

asked Sep 05 '13 07:09

Rob

1 Answers

One of the fundamental flaws in an Entity Attribute Value design (which is what you have here) is the difficulty of efficient and performant querying.

The more efficient structure for storing data is to abandon EAV and use a normalised relational form. But that will necessarily involve changing the structure of the database when the data structures change (which should be self evident).

You could abandon your TextValue/NumericValue/DateValue/BoolValue fields and replace them with a single sql_variant column, which would reduce your query complexity slightly, but the fundamental problem will remain.

As a side note, storing all numerics as floats will cause problems if you ever have to deal with money.

answered Oct 01 '22 15:10

podiluska

Related questions
                            
                                Is it a programmatic way to get SQL keywords (reserved words)
                            
                                Android: SQLite, cursor.moveToNext() always returns false
                            
                                Count the number of matching rows for high magnitude counts (100,000+)
                            
                                Testing PostgreSQL functions that consume and return refcursor
                            
                                How to select a record if the query returns one row, or select no record if the query returns more rows?
                            
                                How to fetch liferay entity through custom-finder in custom plugin portlet?
                            
                                Are all class 40 errors normal in serializable transactions?
                            
                                Executing a stored procedure within a function and not waiting for the return
                            
                                Encrypt a column in SQL 2000 via code or SQL script
                            
                                Postgres Array Prefix Matching
                            
                                SQL Server : check if table column exists and remove rows
                            
                                How can I set the default value of a new table column during a Code-First Migrations?
                            
                                Select ascending and descending on the same field
                            
                                Linq to SQL Left Join, Order and Group By Count
                            
                                How to label "transitive groups" with SQL?
                            
                                what does DBCC DBREINDEX('?', ' ', 80) do?
                            
                                T-SQL - How can I make a SELECT query with multiple LIKE clauses quicker?
                            
                                Update Value without cursor
                            
                                Convert a MSSQL String to Hex and unhex the value in MySQL
                            
                                How to use SELECT... INTO with a JOIN?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Query performance optimization for dynamically joined columns

Tags:

performance

sql

sql-server

tsql

database-performance

Rob

People also ask

1 Answers

podiluska

Recent Activity

Donate For Us