Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I perform a reasonably complex RavenDB query and include the Lucene Score in the results?

Say I have the following User

public class User
{
    // ... lots of other stuff
    public string Id{ get; set; }
    public double Relevance { get; set; }
    public bool IsMentor { get; set; }
    public string JobRole { get; set; }
    public bool IsUnavailable { get; set; }
    public List<string> ExpertiseAreas { get; set; }
    public List<string> OrganisationalAreas { get; set; }
}

Now I want to perform a search that will find all the Users that fully match the following criteria:

  • IsMentor equals true
  • IsUnavailable equals false
  • Id is not equal to a single, excluded user (the person doing the search)

I also want the results to fully or partially match the following criteria but only if search terms are supplied, otherwise I want the constraint to be ignored.

  • JobRole = [value]
  • ExpertiseAreas contains items from [value-1, value-2, value-n]
  • OrganisationalAreas contains items from [value-1, value-2, value-n]

The list of Users returned from this query may not all equally match the criteria. Some will be better matches than others. So I want to order my results by how well they match.

When I display my results I want each result to be given a star-rating (1-5) that indicates how well the user has matched the search.

I spent a few days working out how to do this. So I will now answer my own question and hopefully save you some effort. The answer will not be perfect of course, so please, if you can improve it, do so.

like image 670
biofractal Avatar asked Nov 07 '12 13:11

biofractal


1 Answers

First I need an RavenDB Index that includes all the fields I will be searching over. This is easy.

Index

public class User_FindMentor : AbstractIndexCreationTask<User>
{
    public User_FindMentor()
    {
        Map = users => users.Select(user => new
        {
                user.Id,
                user.IsUnavailable,
                user.IsMentor,
                user.OrganisationalAreas,
                user.ExpertiseAreas,
                user.JobRole
        });
    }
}

Next I need a service method to perform the query. This is where all the magic happens.

Search Service

public static Tuple<List<User>, RavenQueryStatistics> FindMentors(
        IDocumentSession db,
        string excludedUserId = null,
        string expertiseAreas = null,
        string jobRoles = null,
        string organisationalAreas = null,
        int take = 50)
{
    RavenQueryStatistics stats;
    var query = db
            .Advanced
            .LuceneQuery<User, RavenIndexes.User_FindMentor>()
            .Statistics(out stats)
            .Take(take)
            .WhereEquals("IsMentor", true).AndAlso()
            .WhereEquals("IsUnavailable", false).AndAlso()
            .Not.WhereEquals("Id", excludedUserId);

    if (expertiseAreas.HasValue())
        query = query
                .AndAlso()
                .WhereIn("ExpertiseAreas", expertiseAreas.SafeSplit());

    if (jobRoles.HasValue())
        query = query
                .AndAlso()
                .WhereIn("JobRole", jobRoles.SafeSplit());

    if (organisationalAreas.HasValue())
        query = query
                .AndAlso()
                .WhereIn("OrganisationalAreas", organisationalAreas.SafeSplit());

    var mentors = query.ToList();

    if (mentors.Count > 0)
    {
        var max = db.GetRelevance(mentors[0]);
        mentors.ForEach(mentor =>
                        mentor.Relevance = Math.Floor((db.GetRelevance(mentor)/max)*5));
    }

    return Tuple.Create(mentors, stats);
}

Note in the code snippet below, I have not written my own Lucene Query string generator. I did, in fact, write this, and it was a thing of beauty, but then I discovered that RavenDB has a much better fluent interface for building dynamic queries. So save your tears and use the native query interface from the start.

RavenQueryStatistics stats;
var query = db
        .Advanced
        .LuceneQuery<User, RavenIndexes.User_FindMentor>()
        .Statistics(out stats)
        .Take(take)
        .WhereEquals("IsMentor", true).AndAlso()
        .WhereEquals("IsUnavailable", false).AndAlso()
        .Not.WhereEquals("Id", excludedUserId);

Next you can see that I am checking whether or not the search has passed in any values for the conditional elements of the query, for example:

if (expertiseAreas.HasValue())
    query = query
            .AndAlso()
            .WhereIn("ExpertiseAreas", expertiseAreas.SafeSplit());

This uses a few extension methods that I have found generally useful:

public static bool HasValue(this string candidate)
{
    return !string.IsNullOrEmpty(candidate);
}

public static bool IsEmpty(this string candidate)
{
    return string.IsNullOrEmpty(candidate);
}

public static string[] SafeSplit(this string commaDelimited)
{
    return commaDelimited.IsEmpty() ? new string[] { } : commaDelimited.Split(',');
}

Then we have the bit that works out the Relevance of each result. Remember I want to have my results display 1 to 5 stars so I want my Relevance value to be normalized within this range. To do this I must find out the maximum Relevance, which in this case is the value of the first User in the list. This is because Raven auto-magically orders results by relevance if you don't otherwise specify a sort order - very handy.

if (mentors.Count > 0)
{
    var max = db.GetRelevance(mentors[0]);
    mentors.ForEach(mentor =>
                    mentor.Relevance = Math.Floor((db.GetRelevance(mentor)/max)*5));
}

Extracting the Relevance relies on yet another extension method that pulls the lucene score from the ravendb document's metadata, like this:

public static double GetRelevance<T>(this IDocumentSession db, T candidate)
{
    return db
        .Advanced
        .GetMetadataFor(candidate)
        .Value<double>("Temp-Index-Score");
}

Finally we return the list of results along with the query statistics using the new Tuple widget. If you, like me, have not used the Tuple before, it turns out to be an easy way to send more than one value back from a method without using out params. Thats it. So define your method return type and then use 'Tuple.Create()', like this:

public static Tuple<List<User>, RavenQueryStatistics> FindMentors(...)
{
    ...
    return Tuple.Create(mentors, stats);
}

And that is that for the query.

But what about that cool star-rating I mentioned? Well since I am the sort of coder who wants the moon-on-a-stick, I used a nice jQuery plugin called raty which worked just fine for me. Here is some HTML5 + razor + jQuery to give you the idea:

<div id="find-mentor-results"> 
    @foreach (User user in Model.Results)
    {
        ...stuff
        <div class="row">
            <img id="headshot" src="@user.Headshot" alt="headshot"/>
            <h5>@user.DisplayName</h5>
            <div class="star-rating" data-relevance="@user.Relevance"></div>
        </div> 
        ...stuff                       
    }
</div>

<script>
    $(function () {
        $('.star-rating').raty({
            readOnly: true,
            score: function () {
                return $(this).attr('data-relevance');
            }
        });
    });
</script>

And that really is it. Lots to chew on, lots to improve. Don't hold back if you think there is a better / more efficient way.

Here is a screen shot of some test data:

enter image description here

like image 124
biofractal Avatar answered Oct 26 '22 23:10

biofractal