Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does a permuterm index works?

I have read Permuterm indexes page on stanford website, however I can't still figure out how we can reach from: *X* to X*.

So where is the $?


I can get these ones:

For X, look up X$
For X*, look up $X*
For *X, look up X$*
For X*Y, look up Y$X*
like image 221
Ravexina Avatar asked Jun 18 '18 17:06

Ravexina


People also ask

What is a Permuterm index?

The Permuterm index [Garfield 1976] is a time-efficient and elegant solution to the string dictionary problem in which pattern queries may possibly include one wild-card symbol (called Tolerant Retrieval problem). Unfortunately the Permuterm index is space inefficient because it quadruples the dictionary size.

Does Permuterm index require Postfiltering?

k-gram index is more space-efficient permuterm index does not require postfiltering.

What is K-gram index in information retrieval?

A k-gram index maps a k-gram to a postings list of all possible vocabulary terms that contain it. The figure below shows the k-gram postings list corresponding to the bigram “ur”. It is noteworthy that the postings list is sorted alphabetically.

How does the permutation index handle wild card queries?

Now that the permuterm index enables us to identify the original vocabulary terms matching a wildcard query, we look up these terms in the standard inverted index to retrieve matching documents. We can thus handle any wildcard query with a single * symbol.


1 Answers

The idea behind Permuterm Index is to rotate wildcard query such that * goes to the end.

Thus, you convert vague query to comparable query. As you wrote, lookup $X* for query X* because * is uncertain but the start part X is deterministic.

When it comes to X, we have two stars. The problem is which star should we rotate .

  1. Rotate 1st star

    You regard X* as one part Y, then we get *Y. So we should lookup Y$*, which is X*$*, equivalent to X*.

  2. Rotate 2nd star

    You regard *X as one part Y, then we get Y*. So we should lookup *$Y, which is *$*X.This is not easy to handle with.

Based on that, we can know why we lookup X* when we have queries like *X*. The reason is not $ is that $ means the end of the word whereas our query doesn't contain information about the end.

like image 122
Diskun Zhu Avatar answered Dec 30 '22 04:12

Diskun Zhu