I found the following in ML documentation: a range index lets the server map values to fragments, and fragments to values...The former capability is used to support "range predicates" ....The latter is used to support fast order by operations.. Can anyone please explain this to me.Some sort of diagram depicting how this mapping is maintained would be very helpful.
Yes, do read Jason's excellent paper for all the detail of the inner workings of MarkLogic.
A simple summary of range indexes is this: A range index is a sorted term list. Term lists are an inverted index of values stored in documents. For word indexes, for example, a list of terms (a term list) is created that contains all the words in all the documents. Each term in the list is a word, say "humdinger", and an associated set of fragment IDs where that word occurs. When you do a word search for "humdinger", ML checks the term lists to find out which fragments that word occurs in. Easy. A more complex search is simply the set intersections of all the matched terms from all applicable term lists.
Most "regular" indexes are not sorted, they're organized as hashes to make matching terms efficient. They produce a set of results, but not ordered (relevance ordering is applied after). A range index on the other hand, is a term list that's sorted by the values of its terms. A range index therefore represents the range of unique values that occur in all instances of an element or attribute in the database.
Because range index term lists are ordered, when you get matches in a search you not only know which fragments they occur in, you also know the sorted order of the possible values for that field. MarkLogic's XQuery is optimized to recognize when you've supplied an "order by" clause that refers to a element or attribute which is range indexed. This lets it sort not by comparing the matched documents, but by iterating down the sorted term list and fetching matched documents in that order. This makes it much faster because the documents themselves need not be touched to determine their sort order.
But wait, there's more. If you're paginating through search results, taking only a slice of the matching results, then fast sorting by a range indexed field helps you there as well. If you're careful not to access any other part of the document (other than the range index element) before applying the page-window selection predicate, then the documents outside that window will never need to be fetched. The combination of pre-sorted selection and fast skip ahead is really the only way you can efficiently step through large, sorted result sets.
Range indexes have one more useful feature. You can access their values as lexicons, enumerating the unique values that occur in a given element or attribute across your entire database but without every actually looking inside any documents. This comes in handy for things like auto-suggest and getting counts for facets.
I hope that clarifies what range indexes are.
Take a look at Jason Hunter's writeup in Inside MarkLogic Server. There's a whole section on range indexes.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With