I have sitecore pages / lucene documents with the following fields:
I'm creating a search for these and have the following requirements:
Here is what I've got:
public static Expression<Func<T, bool>> GetSearchTermPredicate<T>(string searchTerm)
where T : ISearchableItem
{
var actualPhrasePredicate = PredicateBuilder.True<T>()
.Or(r => r.Title.Contains(searchTerm).Boost(2f))
.Or(r => r.FileName.Contains(searchTerm).Boost(1.5f))
.Or(r => r.Content.Contains(searchTerm))
.Or(r => r.DocumentContents.Contains(searchTerm));
var individualWordsPredicate = PredicateBuilder.False<T>();
foreach (var term in searchTerm.Split(' '))
{
individualWordsPredicate
= individualWordsPredicate.And(r =>
r.Title.Contains(term).Boost(2f)
|| r.FileName.Contains(term).Boost(1.5f)
|| r.Content.Contains(term)
|| r.DocumentContents.Contains(term));
}
return PredicateBuilder.Or(actualPhrasePredicate.Boost(2f),
individualWordsPredicate);
}
The actual phrase part seems to work well. Hits with the full phrase in the title are returned first. However, if I remove a word from the middle of the phrase, no results are returned.
i.e. I have a page with a title "The England football team are dreadful", but when I search with "The football team are dreadful", it doesn't find the page.
Note: pages can have documents attached to them, so I want to boost the filenames too but not as highly as the page title.
I managed to get this to work with the following:
public static Expression<Func<T, bool>> GetSearchTermPredicate<T>(string searchTerm)
where T : ISearchableItem
{
var actualPhraseInTitlePredicate = PredicateBuilder.True<T>()
.And(r => r.Title.Contains(searchTerm));
var actualPhraseInFileNamePredicate = PredicateBuilder.True<T>()
.And(r => r.FileName.Contains(searchTerm));
var actualPhraseInContentPredicate = PredicateBuilder.True<T>()
.And(r => r.Content.Contains(searchTerm));
var actualPhraseInDocumentPredicate = PredicateBuilder.True<T>()
.And(r => r.DocumentContents.Contains(searchTerm));
var terms = searchTerm.Split(' ');
var titleContainsAllTermsPredicate = PredicateBuilder.True<T>();
foreach (var term in terms)
titleContainsAllTermsPredicate
= titleContainsAllTermsPredicate.And(r => r.Title.Contains(term).Boost(2f));
var fileNameAllTermsContains = PredicateBuilder.True<T>();
foreach (var term in terms)
fileNameAllTermsContains
= fileNameAllTermsContains.And(r => r.FileName.Contains(term));
var contentContainsAllTermsPredicate = PredicateBuilder.True<T>();
foreach (var term in terms)
contentContainsAllTermsPredicate
= contentContainsAllTermsPredicate.And(r => r.Content.Contains(term));
var documentContainsAllTermsPredicate = PredicateBuilder.True<T>();
foreach (var term in terms)
documentContainsAllTermsPredicate
= documentContainsAllTermsPredicate.And(r => r.DocumentContents.Contains(term));
var predicate = actualPhraseInTitlePredicate.Boost(3f)
.Or(actualPhraseInFileNamePredicate.Boost(2.5f))
.Or(actualPhraseInContentPredicate.Boost(2f))
.Or(actualPhraseInDocumentPredicate.Boost(1.5f))
.Or(titleContainsAllTermsPredicate.Boost(1.2f))
.Or(fileNameAllTermsContains.Boost(1.2f))
.Or(contentContainsAllTermsPredicate)
.Or(documentContainsAllTermsPredicate);
return predicate;
}
It's obviously quite a bit more code, but I think separating the predicates makes more sense for boosting to work effectively.
The main issue with the previous code was two fold:
PredicateBuilder.Or(actualPhrasePredicate.Boost(2f), individualWordsPredicate)
doesn't seem to include the predicate being Or'd. When doing a .ToString()
on the resulting joined predicate, the expression didn't contain anything for the individualWordsPredicate
PredicateBuilder.False<T>()
for the individualWordsPredicate
. When looking at the expression it was basically producing (False AND Field.Contains(keyword))
which of course will never evaluate to true. Using .True<T>()
fixed this.If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With