In Silbershatz Database System Concepts 6th Ed., in chapter 12, par. 12.3.1, p. 542, there is an explanation of an algorithm for processing a query which is selecting by imposing equality constraint on a nonkey attribute of the relation, using primary index.
The paragraph claims that the read from the file will be consecutive, since the file is sorted by the search key.
I don't understand - why would the read be consecutive?
As I see it, the records are ordered by the clustering key of the primary index, and the selection is using the non-key attributes, so these attributes may be contained in each record of the relation. The only way I see to retrieve all the records is a linear scan of all the relation.
In this context, primary index, equality on non-key
means that we have a primary index on some attribute, namely A
(i.e. the records are sorted physically on the disk based on A
). Here, non-key
means that there may exist multiple records having the same value for the attribute A
, in other words, A
is not guaranteed to be unique. The selection, however, is indeed using equality on A
the clustering key of the primary index.
Thus, the algorithm becomes as follows: use the index to get the first record that satisfies the corresponding equality condition, then use linear scan until the condition breaks.
To make sure this makes sense, Look at algorithm A4:
In short, here non-key
only means that the indexing field is not unique.
You can further refer to slide 13 in these slides. And see how the name is phrased.
As a side note: notice the difference between the cost written on this slide and the cost written in the book which misses one ts
, but it really shouldn't miss it as it is explicitly present in the cost of A2, and clearly stated in the explanation, in Figure 12.3, in page 543. This may be a typo in the book. It isn't really of any importance, I just wanted to point it out.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With