I'm trying to use LAG in T-SQL to compute some lagging features. I got a little worried when the LAG reference page says that this function is non-deterministic. The reference page on function determinism says that "specifying an ORDER BY clause in a query does not change the determinism of a function that used in that query". However, I don't see why LAG would return different results under the same condition. If it does, why would people use it? Maybe I'm not interpreting "determinism" correctly? Thanks!
According to the MSDN documentation, nondeterministic functions may return different results each time they are called with a specific set of input values even if the database state that they access remains the same, so this is not related to data changes(INSERT
, DELETE
, UPDATE
).
However, Eric is right regarding the physical sorting order. The physical sorting order can vary from one query to the other, for example when there are duplicate rows in the data. In that scenario LAG
and LEAD
can return different results depending on the chosen execution plan. On the other hand, the AVG
function is deterministic, because it will always return the same results for the same data set regardless of sorting order.
In mathematics and physics, a deterministic system is a system in which no randomness is involved in the development of future states of the system. A deterministic model will thus always produce the same output from a given starting condition or initial state. https://en.wikipedia.org/wiki/Deterministic_system
The LAG function itself is not deterministic because its results can change depending on data state, Eric is correct. In some data models, and when applied correctly, it can be deterministic (like if you order by numeric key in your lag) but the function definition by itself is not deterministic.
Make sense?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With