Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Oracle composite index for range query conditions

I had a table Blah ( latitude float, longitude float, create_time date, owner_id int , ..... )

and my code does only a single query

select * 
from Blah 
where latitude < l1 and latitude > l2   
and longitude < ll1 and longitude > ll2   
and create_time < t1 and create_time > t2 
and owner_id < o1 and owner_id > o2 ;

(of course the values l1, l2,.... o1,o2 are dynamic params coming from program)

my question is what kind of index I should create; composite index? in case of composite index, which column should I put first? how effective is the index?

I thought about this for a long time, and couldn't find detailed docs on how the oracle index works.

I can find docs that it's implemented using B-tree, in our case: each key in the B-tree is a 4-tuple: ( column1, column2, column3, column4) where the ordering relation of such tuples are defined to be lexical order.

then for the above query, assuming our order is (owner_id, create_time, latitude, longitude), I guess oracle would first need to binary search to the point ( o1, t1, l1,ll1), for this operation, the index is indeed useful. but next, we need to find the ending point of this first interium: we need to find (o1,t1, l1, ll2 ), this can be done by binary search too.

next, we need to find the next section that satisfies the condition, so we need to find (o1, t1, lx, ll1 ) where lx is the next value larger than l1, we could find this by binary search too. but in our case, it's very likely that for the same latitude, there can be no more than 1 longitude, so here binary search is not more effective than linear scan.

following this spirit, it seems that we should put the column with a small value range cardinality first, in this case, create_time, if our points are created on only a few days. also if we never do range conditions, but only equals (=) conditions, then it does not matter which column is first, right?

to make it clearer, here is a simpler example:

let's say I have 2 columns, X, and Y

in the db, values for both are [1,2,....100], so we have 100x100 rows

my query is

select * from mytable where X > 34 and X < 78 and Y > 12 and Y < 15;

say our index is on (X, Y), so the comparison rule between 2 values are

v1 < v2 <=====>  v1.x < v2.x || v1.x == v2.x && v1.y < v2.y

given the above ordering rule, we can see that the values in the index are arranged in serial like (values for x,y):

1,1, 1,2 1,3 .... 1,100     
2,1  2,2 2,3 ......2,100
.....
100,1 100,2 ....... 100,100

now , to search for the values in the query, the B-Tree traversal needs to locate (78-34-1) intervals, hence (78-34-1)*2 lookup (1 for the beginning one for the end locations), not just 2 lookups.

so if we have higher dimensions, the interval counts increases exponentially with the number of dimensions, so indexing may not be useful anymore ------ this is my concern

thanks a lot Yang

like image 594
teddy teddy Avatar asked May 04 '12 16:05

teddy teddy


2 Answers

If your only goal is to create an index to optimize this query, you'd prefer that the columns in the composite index be ordered with the most selective column first. If the predicates on latitude eliminate substantially more rows than the other predicates, it will be more efficient to have that column first. If the predicates on owner_id eliminate substantially more rows than the other predicates, it will be more efficient to have that column first.

In reality, though, we are rarely creating indexes whose only purpose is to optimize a single query. Generally, in order to make the overhead of index maintenance worthwhile, we want our indexes to be useful across many queries. In the case of a composite index, that means ordering the columns by the probability that a query will have predicates on that column. If you have a composite index on owner_id, create_time, latitude, longitude, for example, you can use that for queries that just specify predicates on owner_id. But you would not, realistically, use that index for queries that just specify predicates on longitude.

like image 64
Justin Cave Avatar answered Sep 22 '22 23:09

Justin Cave


First, bear in mind that the "B" in "B-Tree" is not "binary".

Second, when it comes to indexing in Oracle you also have the choice of a bitmap index if:

  1. You have an enterprise edition license
  2. You do not have many sessions concurrently modifying the table
  3. Your indexed values are not close to being unique (statements that bitmap indexes are usable only for low cardinality columns are generally exaggerated)

One type of query that bitmap indexes excel at is in efficiently combining predicates on multiple columns, especially where the set of predicated columns varies (which may not be the case for you of course). If you meet the three conditions above then it would be worth testinig the effect of having four separate bitmap indexes on the table.

like image 42
David Aldridge Avatar answered Sep 24 '22 23:09

David Aldridge