I'm building up a new database in SQL Server 2008 for some reporting, and there are many common business rules pertaining to this data that go into different types of reports. Currently these rules are mostly combined in larger procedural programs, in a legacy language, which I'm trying to move over to SQL. I'm shooting for flexibility in implementing reporting from this data, like some reporting in SAS, some in C#, etc.
My approach currently is to break up these common rules (usually VERY simple logic) and encapsulate them in individual SQL UDFs. Performance is not a concern, I just want to use these rules to populate static fields in a sort of reporting "snapshot", which can then be used to report from in whatever way you want.
I like this modular approach as far as understanding what each rule is doing (and maintaining the rules themselves), but I'm also starting to become a bit afraid that the maintenance may also become a nightmare. Some rules depend on others, but I can't really get away from that - these things build off each other...which is what I want...I think? ;)
Are there some better approaches for this modular approach in a database? Am I on the right track, or am I thinking of this in too much of a application-development mindset?
DATA logic should be in the database. The way you manipulate the database structures to perform the activities required by the functionality SHOULD BE held in the database. When you join up these activities into a larger series of processes (business logic) then THIS can be consolidated in the application.
"The default stance in designing an application should be that business logic is held in the application code, NOT in database stored procedures. Only move business logic into StoredProcedures as performance needs required. "
Business logic is the programming that manages communication between an end user interface and a database. The main components of business logic are business rules and workflows.
At some point, extensive use of UDFs will start to cause performance problems as they are executed for each row in your resultset and obscure logic from the optimizer, making it hard to use indexes (i.e. I don't really understand how performance can not be an issue, but you know your requirements best). For certain functionality they are great; but use them sparingly.
Keeping logic on database side is almost always a right thing to do.
As you mentioned in your question, most business rules involve quite simple logic but it usually deals with huge volumes of data.
The database engine is the right thing to implement that logic because, first, it keeps data I/O
to a minimum, and, second, database performs mosts data transformations much more efficiently.
Some time ago I wrote a very subjective blog post on this topic:
One side note: a UDF
is not the same as a stored procedure.
A UDF
is a function designed by callable inside a query, so it can do only a very limited subset of possible operations.
You can do much more is a stored procedure.
Update:
In the example you gave, like changing logic that calculates a "derived field", the UDF
that calculates the field is OK.
But (just in case) when performance will be an issue (and believe me, this will be much sooner that one may think), transforming data with set-based operations may be much more efficient than using UDF
s.
In this case, you may want to create a view, a stored procedure or a table valued function returning a resultset which will contain a more efficient query rather that limiting yourself to updating the UDF
s (which are record-based).
One example: your query has something like "user score" which you feel to be subject to change and wrap it into a UDF
SELECT user_id, fn_getUserScore(user_id)
FROM users
Initially, this is just a plain field in the table:
CREATE FUNCTION fn_getUserScore(@user_id INT) RETURNS INT
AS
BEGIN
DECLARE @ret INT
SELECT user_score
INTO @ret
FROM users
WHERE user_id = @user_id
RETURN @ret
END
, then you decide it to calculate it using data from other table:
CREATE FUNCTION fn_getUserScore(@user_id INT) RETURNS INT
AS
BEGIN
DECLARE @ret INT
SELECT SUM(vote)
INTO @ret
FROM user_votes
WHERE user_id = @user_id
RETURN @ret
END
This will condemn the engine to using the least efficient NESTED LOOPS
algorithm in either case.
But if you created a view and rewritten the underlying queries like this:
SELECT user_id, user_score
FROM users
SELECT user_id, SUM(vote) AS user_score
FROM users u
LEFT JOIN
user_votes uv
ON uv.user_id = u.user_id
, this would give the engine much wider space for optimization while still keeping the resultset structure and separating logic from presentation.
SQL is set based, and inherently performs poorly when applying a modular approach.
Functions, Stored Procedures and/or Views - they all abstract the underlying logic. The performance problem comes into play when you use two (or more) functions/etc that utilize the same table(s). It means that two queries are made the the same table(s) when one could've been used.
The use of multiple functions says to me that the data model was made to be very "flexible". To me, that means questionable data typing and overall column/table definition. There's a need for functions/etc because the database will allow anything to be stored, which means the possibility of bad data is very high. I'd rather put the effort into always having good/valid data, rather than working after the fact to combat existing bad data.
The database is the place to contain this logic. It is faster than application code, and most importantly - centralized to minimize maintainence.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With