Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Caching Function Results in SQL Server 2000

I want to memoize function results for performance, i.e. lazily populate a cache indexed on the function arguments. The first time I call a function, the cache won't have anything for the input arguments, so it will calculate it and store it before returning it. Subsequent calls just use the cache.

However, it seems that SQL Server 2000 has a stupid arbitrary rule about functions being "deterministic". INSERTs, UPDATEs, and regular stored procedure calls are forbidden. However, extended stored procedures are allowed. How is this deterministic? If another session modifies the database state, the function output will change anyways.

I'm steaming mad. I had thought I could make caching transparent to the user. Is this possible? I don't have the permissions to deploy extended stored procedures.

EDIT:

This limitation is still in 2008. You can't call RAND, for God's sake!

The cache would be implemented by me in the DB. A cache is any data store used for caching...

EDIT:

There are no cases where the same arguments to a function will yield different results, outside of changes to the underlying data. This is a BI platform, and the only changes come from scheduled ETL, at which time I would TRUNCATE the cache table.

These are I/O intensive time series calculations, on the order of O(n^4). I don't have the mandate to change the underlying table or indexes. Also, a lot of these functions use the same intermediate functions, and caching allows those to be used.

UDFs are not truly deterministic, unless they account for changes in database state. What's the point? Is SQL Server caching? (Ironic.) If SQL Server is caching, then it must be expiring on changes to tables that are schema bound. If they're schema bound, then why not bind tables that the function modifies? I can see why procs aren't allowed, although that's just sloppy; just schema bind procs. And, BTW, why allow extended stored procs? You can't possibly track what those do to ensure determinism!!! Argh!!!

EDIT:

My question is: Is there any way to lazily cache function results in a way that can be used in a view?

like image 324
jennykwan Avatar asked Mar 11 '09 20:03

jennykwan


1 Answers

Deterministic means that the same inputs return the same output independent of time and database.

SQL Server (any version) does no caching of UDFs - I believe it will avoid calling the UDF twice on a single row, but that's it.

One trick I've used is to (I think I posted it here on SO):

Refactor the UDF if you can so that there are effectively a usable discrete subset of values returned for a given set of inputs. For numerical calculations, one can sometimes refactor the logic to return a factor or rate which is multiplied outside the UDF instead of multiplied inside the UDF from a passed in value.

Call the UDF over the DISTINCT rowset and cache the results to a temporary table. If you are only calling the UDF with 100,000 tuples of parameters over a 17,000,000 row set, this is very much more efficient.

JOIN to the temporary table (basically converting from code-based logic to table-based logic) to get values.

This table can be re-used as necessary or even kept.

Addition to the table can be done by first LEFT JOINing to find missing cached entries.

This works for both single-row table-valued UDFs and scalar UDFs. I'm mainly using it for table-valued UDFs. There is a hotfix to SQL Server 2005 which is supposed to address the UDF performance - I'm waiting on mthe DBAs to test it before deploying to production.

like image 169
Cade Roux Avatar answered Oct 12 '22 22:10

Cade Roux