Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Oracle 11g: Index not used in "select distinct"-query

My question concerns Oracle 11g and the use of indexes in SQL queries.

In my database, there is a table that is structured as followed:

Table tab (
  rowid NUMBER(11),
  unique_id_string VARCHAR2(2000),
  year NUMBER(4),
  dynamic_col_1 NUMBER(11),
  dynamic_col_1_text NVARCHAR2(2000)
 ) TABLESPACE tabspace_data;

I have created two indexes:

CREATE INDEX Index_dyn_col1 ON tab (dynamic_col_1, dynamic_col_1_text) TABLESPACE tabspace_index;
CREATE INDEX Index_unique_id_year ON tab (unique_id_string, year) TABLESPACE tabspace_index;

The table contains around 1 to 2 million records. I extract the data from it by executing the following SQL command:

SELECT distinct
 "sub_select"."dynamic_col_1" "AS_dynamic_col_1","sub_select"."dynamic_col_1_text" "AS_dynamic_col_1_text"
FROM 
(
    SELECT "tab".*  FROM "tab"
    where "tab".year = 2011
) "sub_select"

Unfortunately, the query needs around 1 hour to execute, although I created the both indexes described above. The explain plan shows that Oracle uses a "Table Full Access", i.e. a full table scan. Why is the index not used?

As an experiment, I tested the following SQL command:

SELECT DISTINCT
 "dynamic_col_1" "AS_dynamic_col_1", "dynamic_col_1_text" "AS_dynamic_col_1_text"
 FROM "tab"

Even in this case, the index is not used and a full table scan is performed.

In my real database, the table contains more indexed columns like "dynamic_col_1" and "dynamic_col_1_text". The whole index file has a size of about 50 GB.

A few more informations:

  • The database is Oracle 11g installed on my local computer.
  • I use Windows 7 Enterprise 64bit.
  • The whole index is split over 3 dbf files with about 50GB size.

I would really be glad, if someone could tell me how to make Oracle use the index in the first query. Because the first query is used by another program to extract the data from the database, it can hardly be changed. So it would be good to tweak the table instead.

Thanks in advance.

[01.10.2011: UPDATE]

I think I've found the solution for the problem. Both columns dynamic_col_1 and dynamic_col_1_text are nullable. After altering the table to prohibit "NULL"-values in both columns and adding a new index solely for the column year, Oracle performs a Fast Index Scan. The advantage is that the query takes now about 5 seconds to execute and not 1 hour as before.

like image 985
oracle_user54 Avatar asked Sep 24 '11 13:09

oracle_user54


People also ask

Does SELECT distinct use index?

When doing a SELECT DISTINCT on an indexed field, an index scan makes sense, as execution still has to scan each value in the index for the entire table (assuming no WHERE clause, as seems to be the case by your example). Indexes usually have more of an impact on WHERE conditions, JOINS , and ORDER BY clauses.

Why is Oracle not using my index?

Oracle not using an index can be due to: · Bad/incomplete statistics - Make sure to re-analyze the table and index with dbms_stats to ensure that the optimizer has good metadata.

What can I use instead of distinct in Oracle?

The DISTINCT operator causes Oracle to fetch all rows satisfying the table join and then sort and filter out duplicate values. EXISTS is a faster alternative, because the Oracle optimizer realizes when the subquery has been satisfied once, there is no need to proceed further and the next matching row can be fetched.


3 Answers

Your index should be:

CREATE INDEX Index_year 
ON tab (year) 
TABLESPACE tabspace_index;

Also, your query could just be:

SELECT DISTINCT
       dynamic_col_1 "AS_dynamic_col_1",
       dynamic_col_1_text "AS_dynamic_col_1_text"
  FROM tab
 WHERE year = 2011;

If your index was created solely for this query though, you could create it including the two fetched columns as well, then the optimiser would not have to go to the table for the query data, it could retrieve it directly from the index making your query more efficient again.

Hope it helps...

like image 23
Ollie Avatar answered Sep 22 '22 06:09

Ollie


I don't have an Oracle instance on hand so this is somewhat guesswork, but my inclination is to say it's because you have the compound index in the wrong order. If you had year as the first column in the index it might use it.

like image 28
Dan Avatar answered Sep 20 '22 06:09

Dan


Are you sure that an index access would be faster than a full table scan? As a very rough estimate, full table scans are 20 times faster than reading an index. If tab has more than 5% of the data in 2011 it's not surprising that Oracle would use a full table scan. And as @Dan and @Ollie mentioned, with year as the second column this will make the index even slower.

If the index really is faster, than the issue is probably bad statistics. There are hundreds of ways the statistics could be bad. Very briefly, here's what I'd look at first:

  1. Run an explain plan with and without and index hint. Are the cardinalities off by 10x or more? Are the times off by 10x or more?
  2. If the cardinality is off, make sure there are up to date stats on the table and index and you're using a reasonable ESTIMATE_PERCENT (DBMS_STATS.AUTO_SAMPLE_SIZE is almost always the best for 11g).
  3. If the time is off, check your workload statistics.
  4. Are you using parallelism? Oracle always assumes a near linear improvement for parallelism, but on a desktop with one hard drive you probably won't see any improvement at all.

Also, this isn't really relevant to your problem, but you may want to avoid using quoted identifiers. Once you use them you have to use them everywhere, and it generally makes your tables and queries painful to work with.

like image 66
Jon Heller Avatar answered Sep 21 '22 06:09

Jon Heller