Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Limiting terms in Solr's TermsComponent to terms originating from certain documents

I am using Solrs TermsComponent to implement an autocomplete feature. My documents contain tags which I have indexed in a "tags" field. Now I can use TermsComponent to find out which tags are used in all the stored documents. This works pretty well so far.

However there is some additional requirement: Every document has an owner field which contains the ID of the user who owns it. The autocomplete list should only contain tags from documents, that the user who is requesting the autocomplete is actually owning.

I have tried to set the query parameter, however this seems to be ignored by the TermsComponent:

public static List<String> findUniqueTags(String beginningWith, User owner) throws IOException {
    SolrParams q = new SolrQuery().setQueryType("/terms")
            .setQuery("owner:" + owner.id.toString())
            .set(TermsParams.TERMS, true).set(TermsParams.TERMS_FIELD, "tags")
            .set(TermsParams.TERMS_LOWER, beginningWith)
            .set(TermsParams.TERMS_LOWER_INCLUSIVE, false)
            .set(TermsParams.TERMS_PREFIX_STR, beginningWith);
    QueryResponse queryResponse;
    try {
        queryResponse = getSolrServer().query(q);
    } catch (SolrServerException e) {
        Logger.error(e, "Error when querying server.");
        throw new IOException(e);
    }

    NamedList tags = (NamedList) ((NamedList)queryResponse.getResponse().get("terms")).get("tags");

    List<String> result = new ArrayList<String>();
    for (Iterator iterator = tags.iterator(); iterator.hasNext();) {
        Map.Entry tag = (Map.Entry) iterator.next();
        result.add(tag.getKey().toString());
    }
    return result;
}

So is there a way of limiting the tags returned by TermsComponent, or do I manually have to query all the tags of the user and filter them myself?

like image 234
Jan Thomä Avatar asked Mar 09 '11 19:03

Jan Thomä


People also ask

How many documents can Solr handle?

Depending on a multitude of factors, a single machine can easily host a Lucene/Solr index of 5 – 80+ million documents, while a distributed solution can provide subsecond search response times across billions of documents.

What is terms in Solr?

The Terms Component provides access to the indexed terms in a field and the number of documents that match each term. This can be useful for building an auto-suggest feature or any other feature that operates at the term level instead of the search or document level.

What is _version_ in Solr?

In a nutshell, Solr uses a special version field named _version_ to enforce safe update semantics for documents. In the case of two different users trying to update the same document concurrently, the user that submits updates last will have a stale version field, so their update will fail.

What is positionIncrementGap in Solr?

positionIncrementGap. For multivalued fields, specifies a distance between multiple values, which prevents spurious phrase matches. integer. autoGeneratePhraseQueries. For text fields.


1 Answers

According to this and that post on the Solr mailing list, filtering on the terms component is not possible because it operates on raw index data.

Apparently, the Solr developers are working on a real autosuggest component that supports your filtering.

Depending on your requirements you might be able to use the faceting component for autocomplete instead of the terms component. It fully supports filter queries for reducing the set of eligible tags to a subset of the documents in the index.

like image 182
Thomas Avatar answered Sep 29 '22 20:09

Thomas