Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

I'm using SQL UDF's to encapsulate simple reporting/business logic. Should I avoid this?

I'm building up a new database in SQL Server 2008 for some reporting, and there are many common business rules pertaining to this data that go into different types of reports. Currently these rules are mostly combined in larger procedural programs, in a legacy language, which I'm trying to move over to SQL. I'm shooting for flexibility in implementing reporting from this data, like some reporting in SAS, some in C#, etc.

My approach currently is to break up these common rules (usually VERY simple logic) and encapsulate them in individual SQL UDFs. Performance is not a concern, I just want to use these rules to populate static fields in a sort of reporting "snapshot", which can then be used to report from in whatever way you want.

I like this modular approach as far as understanding what each rule is doing (and maintaining the rules themselves), but I'm also starting to become a bit afraid that the maintenance may also become a nightmare. Some rules depend on others, but I can't really get away from that - these things build off each other...which is what I want...I think? ;)

Are there some better approaches for this modular approach in a database? Am I on the right track, or am I thinking of this in too much of a application-development mindset?

like image 813
chucknelson Avatar asked Jan 28 '10 22:01

chucknelson


People also ask

Should business logic be in the database?

DATA logic should be in the database. The way you manipulate the database structures to perform the activities required by the functionality SHOULD BE held in the database. When you join up these activities into a larger series of processes (business logic) then THIS can be consolidated in the application.

Should business logic be in stored procedures?

"The default stance in designing an application should be that business logic is held in the application code, NOT in database stored procedures. Only move business logic into StoredProcedures as performance needs required. "

What does business logic mean in programming?

Business logic is the programming that manages communication between an end user interface and a database. The main components of business logic are business rules and workflows.


3 Answers

At some point, extensive use of UDFs will start to cause performance problems as they are executed for each row in your resultset and obscure logic from the optimizer, making it hard to use indexes (i.e. I don't really understand how performance can not be an issue, but you know your requirements best). For certain functionality they are great; but use them sparingly.

like image 119
davek Avatar answered Oct 13 '22 07:10

davek


Keeping logic on database side is almost always a right thing to do.

As you mentioned in your question, most business rules involve quite simple logic but it usually deals with huge volumes of data.

The database engine is the right thing to implement that logic because, first, it keeps data I/O to a minimum, and, second, database performs mosts data transformations much more efficiently.

Some time ago I wrote a very subjective blog post on this topic:

  • Schema Junk

One side note: a UDF is not the same as a stored procedure.

A UDF is a function designed by callable inside a query, so it can do only a very limited subset of possible operations.

You can do much more is a stored procedure.

Update:

In the example you gave, like changing logic that calculates a "derived field", the UDF that calculates the field is OK.

But (just in case) when performance will be an issue (and believe me, this will be much sooner that one may think), transforming data with set-based operations may be much more efficient than using UDFs.

In this case, you may want to create a view, a stored procedure or a table valued function returning a resultset which will contain a more efficient query rather that limiting yourself to updating the UDFs (which are record-based).

One example: your query has something like "user score" which you feel to be subject to change and wrap it into a UDF

SELECT  user_id, fn_getUserScore(user_id)
FROM    users

Initially, this is just a plain field in the table:

CREATE FUNCTION fn_getUserScore(@user_id INT) RETURNS INT
AS
BEGIN
        DECLARE @ret INT
        SELECT  user_score
        INTO    @ret
        FROM    users
        WHERE   user_id = @user_id
        RETURN @ret
END

, then you decide it to calculate it using data from other table:

CREATE FUNCTION fn_getUserScore(@user_id INT) RETURNS INT
AS
BEGIN
        DECLARE @ret INT
        SELECT  SUM(vote)
        INTO    @ret
        FROM    user_votes
        WHERE   user_id = @user_id
        RETURN @ret
END

This will condemn the engine to using the least efficient NESTED LOOPS algorithm in either case.

But if you created a view and rewritten the underlying queries like this:

SELECT  user_id, user_score
FROM    users

SELECT  user_id, SUM(vote) AS user_score
FROM    users u
LEFT JOIN
        user_votes uv
ON uv.user_id = u.user_id

, this would give the engine much wider space for optimization while still keeping the resultset structure and separating logic from presentation.

like image 45
Quassnoi Avatar answered Oct 13 '22 06:10

Quassnoi


SQL is set based, and inherently performs poorly when applying a modular approach.
Functions, Stored Procedures and/or Views - they all abstract the underlying logic. The performance problem comes into play when you use two (or more) functions/etc that utilize the same table(s). It means that two queries are made the the same table(s) when one could've been used.

The use of multiple functions says to me that the data model was made to be very "flexible". To me, that means questionable data typing and overall column/table definition. There's a need for functions/etc because the database will allow anything to be stored, which means the possibility of bad data is very high. I'd rather put the effort into always having good/valid data, rather than working after the fact to combat existing bad data.

The database is the place to contain this logic. It is faster than application code, and most importantly - centralized to minimize maintainence.

like image 42
OMG Ponies Avatar answered Oct 13 '22 05:10

OMG Ponies