Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

LOW_VALUE and HIGH_VALUE in USER_TAB_COLUMNS

I have a question regarding the columns LOW_VALUE and HIGH_VALUE in the view USER_TAB_COLUMNS (or equivalent).

I was just wondering if these values are always correct, as in, if you have a column with 500k rows with value 1, 500k rows with value of 5 and 1 row with a value of 1000, the LOW_VALUE should be 1 (after you convert the raw figure) and HIGH_VALUE should be 1000 (after you convert the raw figure). However, are there any circumstances where Oracle would 'miss' this outlier value and instead have 5 for HIGH_VALUE?

Also, what is the purpose of these 2 values?

Thanks

like image 743
BYS2 Avatar asked Jan 03 '12 22:01

BYS2


1 Answers

As with all optimizer-related statistics, these values are estimates with varying degrees of accuracy from whenever statistics were gathered on the table. As such, it is entirely expected that they would be close but not completely accurate and entirely possible that they would be wildly incorrect.

When you gather statistics, you specify a percentage of the rows (or blocks) that should be sampled. It is possible to specify a 100% sample size, in which case Oracle would examine every row, but it is relatively rare to ask for a sample size nearly that large. It is much more efficient to ask for a much smaller sample size (either explicitly or by letting Oracle automatically determine the sample size). If your sample of rows happens not to include the one row with a value of 1000, the HIGH_VALUE would not be 1000, the HIGH_VALUE would be 5 assuming that is the largest value that the sample saw.

Statistics are also a snapshot in time. By default, 11g will gather statistics every night on objects that have undergone enough change since the last time that statistics were gathered on that object to warrant refreshing the statistics though you can disable that job or change the parameters. So if you gather statistics today with a 100% sample size in order to get a HIGH_VALUE of 1000 and then insert one row with a value of 3000 and never modify the table again, it's likely that Oracle would never gather statistics on that table again (unless you explicitly requested it to) and that the HIGH_VALUE would remain 1000 forever.

Assuming that there is no histogram on the column (which is another whole discussion), Oracle uses the LOW_VALUE and HIGH_VALUE to estimate how selective a particular predicate would be. If the LOW_VALUE is 1, the HIGH_VALUE is 1000, there are 1,000,000 rows in the table, there is no histogram on the column, and you run a query like

SELECT *
  FROM some_table
 WHERE column_name BETWEEN 100 and 101

Oracle will guess that the data is uniformly distributed between 1 and 1000 so that this query would return 1,000 rows (multiplying the number of rows in the table (1 million) by the fraction of the range the query covers (1/1000)). This selectivity estimate, in turn, would drive the optimizer's determination of whether it would be more efficient to use an index or to do a table scan, what join methods to use, what order to evaluate the various predicates, etc. If you have a non-uniform distribution of data, however, you'll likely end up with a histogram on the column which gives Oracle more detailed information about the distribution of data in the column than the LOW_VALUE and HIGH_VALUE provide.

like image 95
Justin Cave Avatar answered Oct 19 '22 06:10

Justin Cave