Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the big O of Oracles MAX function?

Is the time complexity of the Oracle MAX function O(1), O(log n) or O(n) with respect to the number of rows in a table?

like image 893
ceving Avatar asked Jun 28 '12 17:06

ceving


People also ask

What is Max function Oracle?

MAX is an aggregate function that evaluates the maximum of an expression over a set of rows (see Aggregates (set functions)). MAX is allowed only on expressions that evaluate to built-in data types (including CHAR, VARCHAR, DATE, TIME, CHAR FOR BIT DATA, etc.).

Which clause is used for Max function?

The MAX() function is used with the WHERE clause to gain further insights from our data. In SQL, the MAX() function computes the highest or maximum value of numeric values in a column.

What does Max () mean in SQL?

SQL MIN() and MAX() Functions The MIN() function returns the smallest value of the selected column. The MAX() function returns the largest value of the selected column.

How does Plsql calculate max salary?

Try using this SQL SELECT statement: SELECT * FROM employees WHERE department_id=30 AND salary = (SELECT MAX(salary) FROM employees WHERE department_id=30); This will return the employee information for only the employee in department 30 that has the highest salary.


2 Answers

If you have a B-tree index on the column then finding the maximum value is O(log(n)) because the answer will be the last (or first) row of the index. Values are stored in the deepest nodes of the B-tree which has a height O(log(n)).

Without an index it is O(n) because all rows must be read to determine the maximum value.


Note: The O(n) notation ignores constants but in the real world these constants cannot be ignored. The difference between reading from disk and reading from memory is several orders of magnitude. Accessing the first value of an index is likely to be performed mostly in RAM, whereas a full table scan of a huge table will need to read mostly from disk.

like image 66
Mark Byers Avatar answered Oct 27 '22 09:10

Mark Byers


Realistically, it's hard to say without specifying a query, a table definition, and a query plan.

If you have a table that has no index on the column you're computing the MAX on, Oracle will have to do a full table scan. That is going to be O(n) since you've got to scan every block in the table. You can see that by looking at the query plan.

We'll generate a table with 100,000 rows and ensure that the rows are reasonably large using a CHAR(1000) column

SQL> create table foo( col1 number, col2 char(1000) );

Table created.

SQL> insert into foo
  2    select level, lpad('a',1000)
  3      from dual
  4   connect by level <= 100000;

100000 rows created.

Now, we can look at the plan for the basic MAX operation. This is doing a full table scan (an O(n) operation)

SQL> set autotrace on;
SQL> select max(col1)
  2    from foo;

 MAX(COL1)
----------
    100000


Execution Plan
----------------------------------------------------------
Plan hash value: 1342139204

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |    13 |  4127   (1)| 00:00:50 |
|   1 |  SORT AGGREGATE    |      |     1 |    13 |            |          |
|   2 |   TABLE ACCESS FULL| FOO  |   106K|  1350K|  4127   (1)| 00:00:50 |
---------------------------------------------------------------------------

Note
-----
   - dynamic sampling used for this statement (level=2)


Statistics
----------------------------------------------------------
         29  recursive calls
          1  db block gets
      14686  consistent gets
          0  physical reads
        176  redo size
        527  bytes sent via SQL*Net to client
        523  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

If you create an index on the column you're computing the MAX of, Oracle can do a MIN/MAX scan on the index. That is an O(log n) operation if that's the plan the optimizer chooses. Of course, as a practical matter, this is functionally an O(1) operation because the height of an index is never realistically going to exceed 4 or 5-- the constant terms here are going to dominate.

SQL> create index idx_foo_col1
  2      on foo( col1 );

Index created.

SQL> select max(col1)
  2    from foo;

 MAX(COL1)
----------
    100000


Execution Plan
----------------------------------------------------------
Plan hash value: 817909383

-------------------------------------------------------------------------------------------
| Id  | Operation                  | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |              |     1 |    13 |     2   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE            |              |     1 |    13 |            |          |
|   2 |   INDEX FULL SCAN (MIN/MAX)| IDX_FOO_COL1 |     1 |    13 |     2   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------

Note
-----
   - dynamic sampling used for this statement (level=2)

Statistics
----------------------------------------------------------
          5  recursive calls
          0  db block gets
         83  consistent gets
          1  physical reads
          0  redo size
        527  bytes sent via SQL*Net to client
        523  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

But then things get harder. Both MIN and MAX have the same O(log n) behavior individually. But if you have both MIN and MAX in the same query, suddenly you're back to an O(n) operation. Oracle (as of 11.2) hasn't implemented an option grab both the first block and the last block of an index

SQL> ed
Wrote file afiedt.buf

  1  select min(col1), max(col1)
  2*   from foo
SQL> /

 MIN(COL1)  MAX(COL1)
---------- ----------
         1     100000


Execution Plan
----------------------------------------------------------
Plan hash value: 1342139204

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |    13 |  4127   (1)| 00:00:50 |
|   1 |  SORT AGGREGATE    |      |     1 |    13 |            |          |
|   2 |   TABLE ACCESS FULL| FOO  |   106K|  1350K|  4127   (1)| 00:00:50 |
---------------------------------------------------------------------------

Note
-----
   - dynamic sampling used for this statement (level=2)


Statistics
----------------------------------------------------------
          4  recursive calls
          0  db block gets
      14542  consistent gets
          0  physical reads
          0  redo size
        601  bytes sent via SQL*Net to client
        523  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

Of course, in subsequent versions of Oracle, this optimization might be implemented and this would go back to being an O(log n) operation. Of course, you can also rewrite the query to get a different query plan that goes back to being O(log n)

SQL> ed
Wrote file afiedt.buf

  1  select (select min(col1) from foo) min,
  2         (select max(col1) from foo) max
  3*   from dual
SQL>
SQL> /

       MIN        MAX
---------- ----------
         1     100000


Execution Plan
----------------------------------------------------------
Plan hash value: 3561244922

-------------------------------------------------------------------------------------------
| Id  | Operation                  | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |              |     1 |       |     2   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE            |              |     1 |    13 |            |          |
|   2 |   INDEX FULL SCAN (MIN/MAX)| IDX_FOO_COL1 |     1 |    13 |     2   (0)| 00:00:01 |
|   3 |  SORT AGGREGATE            |              |     1 |    13 |            |          |
|   4 |   INDEX FULL SCAN (MIN/MAX)| IDX_FOO_COL1 |     1 |    13 |     2   (0)| 00:00:01 |
|   5 |  FAST DUAL                 |              |     1 |       |     2   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------


Note
-----
   - dynamic sampling used for this statement (level=2)


Statistics
----------------------------------------------------------
          7  recursive calls
          0  db block gets
        166  consistent gets
          0  physical reads
          0  redo size
        589  bytes sent via SQL*Net to client
        523  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed
like image 32
Justin Cave Avatar answered Oct 27 '22 10:10

Justin Cave