I'm creating some database tables for storing log entries. Under normal circumstances I would always normalize and never stuff values together, but I'm not 100% convinced that's a good idea here.
I could normalize and have:
LogEntry has a many to many relationship with LogCategory and a LogEntry has a 1 to many with LogEntryProperty (which are name/value pairs).
Alternative is a denormalized version which has just LogEntry with categories stored as a comma delimited list of string categories and properties stored as a comma limited list of Name: Value formatted properties. As ugly as this sounds, from a reporting, performance, and searchability perspective, I'm not sure if this isn't better.
Which is a better idea?
Thanks.
Log table (logarithm table) is used in performing bigger calculations (of multiplication, division, squares, and roots) without using a calculator. The logarithm of a number to a given base is the exponent by which that base should be raised to give the original number.
Database logging stores a record of changes to tables or fields in the database log table. The business uses of database logging include: Creating an audit record of changes to specific tables that contain sensitive information.
SQL Server tables are contained within database object containers that are called Schemas. The schema also works as a security boundary, where you can limit database user permissions to be on a specific schema level only. You can imagine the schema as a folder that contains a list of files.
Because there are only a few distinct properties, I would stay away from name-value pairs and give each property a separate table with a proper name and data-type. I have used generic Property_
, just for the demo.
The thing here is to make sure not to insert a value into a property table if it is missing, in other words all property values are NOT NULL.
To make life easier, define a view
create view dbo.vLogs AS
select
LogCategoryName
, LogTime
, p1_Value
, p2_Value
, p3_Value
, p4_Value
, p5_Value
from LogEntry as e
left join Property_1 as p1 on p1.LogEntryId = e.LogEntryId
left join Property_2 as p2 on p2.LogEntryId = e.LogEntryId
left join Property_3 as p3 on p3.LogEntryId = e.LogEntryId
left join Property_4 as p4 on p4.LogEntryId = e.LogEntryId
left join Property_5 as p5 on p5.LogEntryId = e.LogEntryId
left join LogEntryCategory as x on x.LogEntryId = e.LogEntryId
left join LogCategory as c on c.LogCategoryID = x.LogCategoryID
This view (query) looks complicated and long; however, if you try a query like the one below and look at the execution plan, you may notice that property tables which are not mentioned in the select list are not included in the plan (not touched).
select
LogCategoryName
, LogTime
, p1_Value
, p2_Value
from dbo.vLogs
where LogCategoryName = 'some_category'
and LogTime between from_time and to_time
and if you need something simple like this
select max(p1_Value)
from dbo.vLogs
where LogTime between '2011-07-18' and '2011-07-19'
Here is the execution plan, as you can see only two tables are involved.
This is called table (join) elimination and you do need SQL Server, Oracle, PostgreSql 9.x, ... for this to work -- will not work on MySql (yet).
Each time a property is added, you would have to add a new table and modify the view.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With