Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why "first" or "any" aggregate functions are not commonly used in database engines?

Tags:

sql

database

Some database engines like Microsoft Access supports FIRST() as an aggregate function and I was using it in cases I know the column will only have one value in the group.

Potentially, the database engine can optimise this as if it reaches any value it can mark this value as already calculated. So it is a surprise why this is not supported in for example Oracle or SQL Server, and more importantly, not in the SQL standard.

In practice, people uses MIN() or MAX() instead, but they all require

  1. The data type underneath have natural ordering semantic and the ordering does matter for the user;

  2. The database engine have to compare the intermediate value with the values in each rows

So this is not optimal in many cases.

Are there any specific reasons people don't want to allow SELECT ANY(FIELD) ...? (I could think of two variants: ANY() gives any value in the result set that the column is not null; FIRST() gives the column value for the first row in result set, or null if there is no rows)

like image 726
Earth Engine Avatar asked Oct 18 '17 05:10

Earth Engine


People also ask

Is there an aggregate function in SQL Server?

Standard SQL has an aggregate function any but for a different use case. Again, what you (MS Access SQL) suggest for any gives you a non-deterministic result, which is not what standard SQL encourages.

Can first () be used as an aggregate function?

Bookmark this question. Show activity on this post. Some database engines like Microsoft Access supports FIRST () as an aggregate function and I was using it in cases I know the column will only have one value in the group. Potentially, the database engine can optimise this as if it reaches any value it can mark this value as already calculated.

What is the use of the any function in SQL?

Standard SQL has an aggregate function any but for a different use case. Again, what you (MS Access SQL) suggest for any gives you a non-deterministic result, which is not what standard SQL encourages. The standard SQL function any returns a boolean that is true if any of the conditions is true. It is best used in having clauses:

Can first () be used to aggregate columns with only one value?

Some database engines like Microsoft Access supports FIRST () as an aggregate function and I was using it in cases I know the column will only have one value in the group. Potentially, the database engine can optimise this as if it reaches any value it can mark this value as already calculated.


2 Answers

Regarding first/last

The syntax supported by Microsoft Access SQL doesn't make sense in standard SQL:

SELECT  
       First(LastName) as First,
       Last(LastName) as Last
  FROM Employees

(source)

In Standard SQL, grouping takes places before sorting. Normally, groups are not sorted. That means, it is undefined which row comes first/last. Standard SQL generally aims to avoid constructs that have nondeterministic behaviour (exceptions exist).

Standard SQL offers to so-called ordered set functions that accept an within group (order by...) clause to establish an order in a group prior to aggregation:

SELECT
        PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY val)
  FROM ...

The range for the argument of percentile_disc is 0 to 1 whereas 0 is the first result and 1 the last. 0.5 is the median (this is the common use-case for percentile_disc).

However, standard SQL does not offer first/last as ordered set function, but percentile_disc with an argument of 0 is basically first, while the value 1 would basically give you the last result.

The more common SQL way to obtain the first/last value is to use a top-n query:

SELECT LastName
  FROM Employees
 ORDER BY ...
 FETCH FIRST 1 ROW ONLY

Fetching first and last value in one go is a little bit awkward.

Other than that, standard SQL also offers the window functions first_value and last_value to pick those values out of a partition without grouping.

Regarding any

Standard SQL has an aggregate function any but for a different use case. Again, what you (MS Access SQL) suggest for any gives you a non-deterministic result, which is not what standard SQL encourages.

The standard SQL function any returns a boolean that is true if any of the conditions is true. It is best used in having clauses:

SELECT
       *
  FROM ..
 GROUP BY ...
HAVING ANY(<condition>)

This remove all groups where no <condition> evaluates to true.

References:

  • Slides regarding WITHIN GROUP: https://www.slideshare.net/MarkusWinand/modern-sql/105
  • Blog post on the every function (which is similar to any): https://blog.jooq.org/2014/12/18/a-true-sql-gem-you-didnt-know-yet-the-every-aggregate-function/
like image 176
Markus Winand Avatar answered Oct 24 '22 20:10

Markus Winand


FIRST (or better ANY_VALUE as MySQL calls the function) is not in the SQL standard. This is probably because it is hardly needed in a standard-compliant DBMS.

You say you use FIRST "in cases I know the column will only have one value in the Group". In a well-built database such a case should hardly ever occur. Maybe you are rather using it, because MS Access (and several other DBMS) violate the standard when it comes to aggregation with a GROUP BY clause. An example:

select department_id, d.department_name, count(*) as num_employees
from employees e
join departments d using (department_id)
group by d.department_id;

You may want to use FIRST(d.department_name) here, because you know that per department_id there will be just one department_name of course. But so does the DBMS (or better: it should!). In standard SQL the above query is valid, because the department_name is functionally dependent on the department_id. No need hence for a FIRST or ANY_VALUE function.

MySQL introduced ANY_VALUE mainly in order to deal with cases where the DBMS fails to detect the functional dependency, but again these cases should be extremely rare. I like the function, because it gives you the opportunity to say "I don't care which", e.g. give me the departments and one leader per department (i.e. in case there are two department leaders: one of them arbitrarily chosen). But well, I guess in the standard comittee they decided that MIN or MAX would serve the purpose in such rare cases, too.

like image 26
Thorsten Kettner Avatar answered Oct 24 '22 21:10

Thorsten Kettner