in order to make the following type of assertion
create assertion assert
check "EMPTY SET" = (select User
from Video
where date=current_date()
group by user
having count(*) >= 10
is this assertion right?
create assertion assert
check 0 = (select count(*)
from Video
where date=current_date()
group by user
having count(*) >= 10
ASSERTIONS IN SQL An assertion is a piece of SQL which makes sure a condition is satisfied, else or it stops the action being taken on a database. An assertion is a constraint that might be dependent upon multiple rows of multiple tables.
It is faster to run a Transact-SQL assertion on the server than to first move data to a client computer and manipulate it locally. Familiarity with language. You might prefer a particular language based on your current expertise and therefore choose Transact-SQL assertions or Visual C# or Visual Basic test conditions. Complicated validation.
If an assertion returns any rows, future steps in a pipeline should either fail, or a notification delivered to the data owner. Dataform has built in support for data assertions, and provides a way to run them as part of a larger SQL pipeline.
Assertions should be run as part of any data pipelines to make sure breaking changes are picked up the moment they happen. If an assertion returns any rows, future steps in a pipeline should either fail, or a notification delivered to the data owner.
For full details on CREATE ASSERTION
see the ISO SQL-92 Standard spec.
The CHECK
definition should be in parentheses.
CURRENT_DATE
does not have parentheses.
USER
and DATE
are reserved words.
SQL statements should be terminated with a semicolon character.
SQL Keywords should be in upper case.
Try something more like this:
CREATE ASSERTION assert
CHECK (0 = (
SELECT COUNT(*)
FROM Video
WHERE my_date = CURRENT_DATE
GROUP
BY my_user
HAVING COUNT(*) >= 10
));
You can test that the syntax is correct using the online Mimer SQL-92 Validator. However, you should also test your logic e.g. CURRENT_DATE
is non-deterministic.
Also, I don't think this ASSERTION
will ever bite. When the cardinality of the subquery is less than 10 it will return zero rows and 0 = empty set
will evaluate to UNKNOWN
. When the cardinality of the subquery is 10 or greater then the search condition will evaluate TRUE
. SQL-92 Standard states
The assertion is not satisfied if and only if the result of evaluating the search condition is false.
Note you can replace your CHECK (0 = (SELECT COUNT(*) FROM...))
construct with CHECK (NOT EXISTS (SELECT * FROM...))
, the latter of which I find easier to write.
UPDATE:
How should I write the assertion using CHECK NOT EXISTS ?
As I said above, your logic appears flawed so it is hard to implement properly ;)
Let's say the rule is to limit Videos to 10 per user per day. Because this involves only a single table, it would be more appropriate to use a table-level CHECK
constraint; such a constraint is checked when the table is updated which is sufficient in this case (there's no reason why it couldn't be an ASSERTION
, though, which in theory could be checked each time any table in the schema is updated):
ALTER TABLE Video ADD
CONSTRAINT video_limit_10_per_user_per_day
CHECK (NOT EXISTS (
SELECT v1.my_user, v1.my_date
FROM Video AS V1
GROUP BY v1.my_user, v1.my_date
HAVING COUNT(*) > 10
));
UPDATE 2:
thanks,now let's say we want to limit videos to 100 per user per year, in this case using current_date would be necessary wouldn't it?
Consider again that a CHECK
/ASSERTION
will only be checked when data in the table/schema is updated. The problem with using CURRENT_DATE
(and other non-determninistic functions) in a constraint is that the business rule can be invalidated simply by the clock ticking over from one time period to the next but if the data hasn't been changed in that period then the data integrity failure will not be detected and the database will contain invalid data.
Another consideration is what is meant by a year in context.
It could be the calendar year (1 Jan to 31 Dec inclusive) or other other fixed dates defined by enterprise (e.g. 1 Apr to 31 Mar inclusive), in which case grouping by year and user then counting is trivial.
A more interesting case is when the rule limits the tally for any 12 month period; extending this to both the past and future avoid the above 'non-deterministic' issue.
Consider a standard approach of using an auxiliary calendar table, containing one row for every day applicable to the enterprise, extended into the past and future only as far as required should still only comprise a few thousand rows. Each row would have the date as a key with a second column for that date plus one year (and if necessary you could fine tune the definition of 'a year' at one-day granularity!) The test for would involve joining to the Calendar table, grouping on the calendar date and user and counting e.g. something like this:
SELECT C1.dt, V1.my_user
FROM Video AS V1
INNER JOIN Calendar AS C1
ON (V1.my_date BETWEEN C1.dt AND C1.dt_plus_one_year)
GROUP
BY C1.dt, V1.my_user
HAVING COUNT(*) > 100;
This could be put in a CHECK (NOT EXISTS (...
constraint. This could still be a table-level CHECK
constraint: because the Calendar table is an auxiliary table it would only be suject to infrequent controlled updates (but again could be an ASSERTION
if required).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With