I want the queries to return a score that gets calculated like:
occurrence of each query term in title + description / number of query terms
for example
EbSearch.add [
new_job( id: 1, title: "Java Programmierer",
description: "Java Programmierer")
]
res = EbSearch.search("Java Programmierer").results.first.score.should == 4
at the moment it outputs 8, because it does the query for each term and sums it up. I could just divide afterwards, but I don't have the analyzed query terms, so compounds could mess up the score.
The query is structured like below:
search = Tire.search index_name do
query do
dis_max do
query { string query, fields: ['title^3', 'description.with_synonyms^0.5'], use_dis_max: false, default_operator: "OR" }
query { string query, fields: ['title^3', 'description.without_synonyms'], use_dis_max: false, default_operator: "OR"}
end
end
end
Any idea how i could solve this problem is greatly appreciated.
EDIT
I realized that i provided not enough context.
Here are some other snippets I already worked out. I wrote a custom SimilarityProvider to disable idf and normalization. https://gist.github.com/outsmartin/6114175
The complete Tire code is found here https://gist.github.com/6114186. It is a little bit more complicated then the example, but it should be understandable.
The count API allows you to execute a query and get the number of matches for that query. The query can either be provided using a simple query string as a parameter, or using the Query DSL defined within the request body. The count API supports multi-target syntax.
The score represents how relevant a given document is for a specific query. The default scoring algorithm used by Elasticsearch is BM25.
The basic mechanics are as follows: ElasticSearch Score is normalized between 0..1 ( score/max(score) ), we add our ranking score ( also normalized between 0..1 ) and divide by 2.
Term query return documents that contain one or more exact term in a provided field. The terms query is the same as the term query, except you can search for multiple values.
You can easily get a list of analyzed terms for your query using analyze command. However, I have to mention that Elasticsearch scoring is much more complicated than it might seem when you run your tests on tiny indices. You can find the formula that Elasticsearch is using in Lucene documentation and you can use explain command to see how this formula is getting applied to your results. I would also suggest testing and tuning your scoring algorithm on an index with a single shard or using dfs_query_then_fetch search type, which produces more precise results on small indices.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With