I have read Permuterm indexes page on stanford website, however I can't still figure out how we can reach from: *X*
to X*
.
So where is the $
?
I can get these ones:
For X, look up X$
For X*, look up $X*
For *X, look up X$*
For X*Y, look up Y$X*
The Permuterm index [Garfield 1976] is a time-efficient and elegant solution to the string dictionary problem in which pattern queries may possibly include one wild-card symbol (called Tolerant Retrieval problem). Unfortunately the Permuterm index is space inefficient because it quadruples the dictionary size.
k-gram index is more space-efficient permuterm index does not require postfiltering.
A k-gram index maps a k-gram to a postings list of all possible vocabulary terms that contain it. The figure below shows the k-gram postings list corresponding to the bigram “ur”. It is noteworthy that the postings list is sorted alphabetically.
Now that the permuterm index enables us to identify the original vocabulary terms matching a wildcard query, we look up these terms in the standard inverted index to retrieve matching documents. We can thus handle any wildcard query with a single * symbol.
The idea behind Permuterm Index is to rotate wildcard query such that * goes to the end.
Thus, you convert vague query to comparable query. As you wrote, lookup $X* for query X* because * is uncertain but the start part X is deterministic.
When it comes to X, we have two stars. The problem is which star should we rotate .
Rotate 1st star
You regard X* as one part Y, then we get *Y. So we should lookup Y$*, which is X*$*, equivalent to X*.
Rotate 2nd star
You regard *X as one part Y, then we get Y*. So we should lookup *$Y, which is *$*X.This is not easy to handle with.
Based on that, we can know why we lookup X* when we have queries like *X*. The reason is not $ is that $ means the end of the word whereas our query doesn't contain information about the end.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With