I'm using SQL UDF's to encapsulate simple reporting/business logic. Should I avoid this?

Tags:

I'm building up a new database in SQL Server 2008 for some reporting, and there are many common business rules pertaining to this data that go into different types of reports. Currently these rules are mostly combined in larger procedural programs, in a legacy language, which I'm trying to move over to SQL. I'm shooting for flexibility in implementing reporting from this data, like some reporting in SAS, some in C#, etc.

My approach currently is to break up these common rules (usually VERY simple logic) and encapsulate them in individual SQL UDFs. Performance is not a concern, I just want to use these rules to populate static fields in a sort of reporting "snapshot", which can then be used to report from in whatever way you want.

I like this modular approach as far as understanding what each rule is doing (and maintaining the rules themselves), but I'm also starting to become a bit afraid that the maintenance may also become a nightmare. Some rules depend on others, but I can't really get away from that - these things build off each other...which is what I want...I think? ;)

Are there some better approaches for this modular approach in a database? Am I on the right track, or am I thinking of this in too much of a application-development mindset?

813

asked Jan 28 '10 22:01

chucknelson

3 Answers

At some point, extensive use of UDFs will start to cause performance problems as they are executed for each row in your resultset and obscure logic from the optimizer, making it hard to use indexes (i.e. I don't really understand how performance can not be an issue, but you know your requirements best). For certain functionality they are great; but use them sparingly.

119

answered Oct 13 '22 07:10

davek

Keeping logic on database side is almost always a right thing to do.

As you mentioned in your question, most business rules involve quite simple logic but it usually deals with huge volumes of data.

The database engine is the right thing to implement that logic because, first, it keeps data I/O to a minimum, and, second, database performs mosts data transformations much more efficiently.

Some time ago I wrote a very subjective blog post on this topic:

Schema Junk

One side note: a UDF is not the same as a stored procedure.

A UDF is a function designed by callable inside a query, so it can do only a very limited subset of possible operations.

You can do much more is a stored procedure.

Update:

In the example you gave, like changing logic that calculates a "derived field", the UDF that calculates the field is OK.

But (just in case) when performance will be an issue (and believe me, this will be much sooner that one may think), transforming data with set-based operations may be much more efficient than using UDFs.

In this case, you may want to create a view, a stored procedure or a table valued function returning a resultset which will contain a more efficient query rather that limiting yourself to updating the UDFs (which are record-based).

One example: your query has something like "user score" which you feel to be subject to change and wrap it into a UDF

SELECT  user_id, fn_getUserScore(user_id)
FROM    users

Initially, this is just a plain field in the table:

CREATE FUNCTION fn_getUserScore(@user_id INT) RETURNS INT
AS
BEGIN
        DECLARE @ret INT
        SELECT  user_score
        INTO    @ret
        FROM    users
        WHERE   user_id = @user_id
        RETURN @ret
END

, then you decide it to calculate it using data from other table:

CREATE FUNCTION fn_getUserScore(@user_id INT) RETURNS INT
AS
BEGIN
        DECLARE @ret INT
        SELECT  SUM(vote)
        INTO    @ret
        FROM    user_votes
        WHERE   user_id = @user_id
        RETURN @ret
END

This will condemn the engine to using the least efficient NESTED LOOPS algorithm in either case.

But if you created a view and rewritten the underlying queries like this:

SELECT  user_id, user_score
FROM    users

SELECT  user_id, SUM(vote) AS user_score
FROM    users u
LEFT JOIN
        user_votes uv
ON uv.user_id = u.user_id

, this would give the engine much wider space for optimization while still keeping the resultset structure and separating logic from presentation.

answered Oct 13 '22 06:10

Quassnoi

SQL is set based, and inherently performs poorly when applying a modular approach.
Functions, Stored Procedures and/or Views - they all abstract the underlying logic. The performance problem comes into play when you use two (or more) functions/etc that utilize the same table(s). It means that two queries are made the the same table(s) when one could've been used.

The use of multiple functions says to me that the data model was made to be very "flexible". To me, that means questionable data typing and overall column/table definition. There's a need for functions/etc because the database will allow anything to be stored, which means the possibility of bad data is very high. I'd rather put the effort into always having good/valid data, rather than working after the fact to combat existing bad data.

The database is the place to contain this logic. It is faster than application code, and most importantly - centralized to minimize maintainence.

answered Oct 13 '22 05:10

OMG Ponies

Related questions
                            
                                How to link Gatsby.js with my Express server
                            
                                Use Left Join Alias in Column Select in SQL Views
                            
                                Does INTERSECT have a higher precedence compared to UNION?
                            
                                Remove element from json array by condition sql server 2016
                            
                                Azure Data Studio - Setting SQL variables to be used as globals
                            
                                Select date ranges where periods do not overlap
                            
                                TADOQuery - Edit mode inserts new record rather than editing
                            
                                Dropping a group of tables in SQL Server
                            
                                How do I find the high water mark (for sessions) on Oracle 9i
                            
                                Any good SQL Anywhere database schema comparison tools?
                            
                                While-clause in T-SQL that loops forever
                            
                                LINQ COUNT on multiple columns
                            
                                SQL produced by Entity Framework for string matching
                            
                                MySQL Fulltext index with Rails 2.3.2 (migration problem)
                            
                                Get the SUM of TIME datatypes (MSSQL08) from a table
                            
                                Handling nulls in Datawarehouse
                            
                                SQL Query to Select Everything Except the Max Value
                            
                                Multiple Row_Number() Calls in a Single SQL Query
                            
                                Database design for custom form builder (and storage of results)
                            
                                Transfer MySQL to SQLite

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

I'm using SQL UDF's to encapsulate simple reporting/business logic. Should I avoid this?

Tags:

sql

sql-server

user-defined-functions

modularity