Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do you configure Lucene in Sitecore to only index the latest version of an item on the master db?

I recognise this is a moot point on the web database, so this question applies to the master db...

I have a custom index set up in Sitecore 6.4.1 as follows:

<index id="search_content_US" type="Sitecore.Search.Index, Sitecore.Kernel">
    <param desc="name">$(id)</param>
    <param desc="folder">_search_content_US</param>
    <Analyzer ref="search/analyzer" />
    <locations hint="list:AddCrawler">
        <search_content_home type="Sitecore.Search.Crawlers.DatabaseCrawler, Sitecore.Kernel">
            <Database>master</Database>
            <Root>/sitecore/content/usa home</Root>
            <Tags>home content</Tags>
        </search_content_home>
    </locations>
</index>

I query the index like this (I am using techphoria414's SortableIndexSearchContext from this answer: How to sort/filter using the new Sitecore.Search API):

private SearchHits GetSearchResults(SortableIndexSearchContext searchContext, string searchTerm)
    {
        CombinedQuery query = new CombinedQuery();
        query.Add(new FullTextQuery(searchTerm), QueryOccurance.Must);
        return searchContext.Search(query, Sort.RELEVANCE);
    }

...

SearchHits hits = GetSearchResults(searchContext, searchTerm);

hits is a collection of search hits from my index. When I iterate through hits I can see that there are many duplicates of the same items in Sitecore, 1 per version of the item.

I then do the following to get a SearchResultCollection:

SearchResultCollection results = hits.FetchResults(0, hits.Length);

This combines all of the duplicates into a single SearchResult object. This object represents 1 version of a particular item, and has a property called SubResults which is a collection of SearchResults that represent all of the other item versions.

Here's my problem:

The version of the item represented by the SearchResult is NOT the current published version of the item! It appears to be a randomly selected version (whichever the search method hit first in the index). The latest version is included in the SubResults collection, however.

E.g.:

SearchResult
 |
 |- Version 8 // main result
 ...
 |- SubResults
      |
      |- Version 9 // latest version
      |- Version 3
      |- Version 5
      ... // all versions in random order

How do I prevent this from happening on the master db? Either by preventing Lucene from indexing old versions of items, or by doing some manipulation of the result set to get the latest version from the SubResults?

As an aside, why does Lucene bother to index old versions of items anyway? Surely this is pointless for searching content on your website as the old versions are not visible?

like image 844
theyetiman Avatar asked Dec 04 '12 11:12

theyetiman


People also ask

What is indexing strategy in Sitecore?

Index update strategies provide a way for you to customize how and when a Sitecore index get updated. In a recent project we had an index that contained computed fields based on related items. We needed a way to update the index entry for one item when a related item was published.

What is search and indexing in Sitecore?

Sitecore Experience Platform (XP) has three search frameworks: Content Search, which is used by various components primarily to search and index Sitecore items. xConnect Search, which is used to search and index experience data. Commerce Search, which is used to search and index customers and orders.

How SOLR works in Sitecore?

Populate an XML schema for SolrThis tool automatically populates Sitecore fields and makes sure all fields that Sitecore needs are present. You can add more fields to this schema by adding fields to the managed-schema file as long as you do not change the system index fields.


2 Answers

You can implement a custom crawler that overrides the following:

public class IndexCrawler : DatabaseCrawler
{
    protected override void IndexVersion(Item item, Item latestVersion, Sitecore.Search.IndexUpdateContext context)
    {
        if (item.Versions.Count > 0 && item.Version.Number != latestVersion.Version.Number)
            return;

        base.IndexVersion(item, latestVersion, context);
    }
}

This ensures that only the latest version of an item gets into your Index, and therefore will be the only item pull out of said index

You would need to update your configuration file to set the correct type for the index of course

like image 163
Andrew Burgess Avatar answered Nov 04 '22 00:11

Andrew Burgess


In Sitecore 7 a field _latestversion was added to the index, containing a '1' for the latest version (other versions have empty value).

like image 23
Stijn De Vos Avatar answered Nov 04 '22 01:11

Stijn De Vos