The default value for disable_coord in ES as per documentation is false. I cannot find a detailed explanation for how setting this parameter to true would affect search results.
A Boolean search is a query technique that utilizes Boolean Logic to connect individual keywords or phrases within a single query. The term “Boolean” refers to a system of logic developed by the mathematician and early computer pioneer, George Boole.
Boolean, or a bool query in Elasticsearch, is a type of search that allows you to combine conditions using Boolean conditions. Elasticsearch will search the document in the specified index and return all the records matching the combination of Boolean clauses.
Term queryedit. Returns documents that contain an exact term in a provided field. You can use the term query to find documents based on a precise value such as a price, a product ID, or a username.
The score represents how relevant a given document is for a specific query. The default scoring algorithm used by Elasticsearch is BM25.
if there are N
subqueries in bool query with same boosts/weights then disable_coord=true
will follow next logic...
Assume that:
N
-- total number of subqueries.n
-- number of subqueries that matched.When n
subqueries are matched: total score will be proportional to sum of boosts/weights of matched queries. As we now assuming equal weights/boosts this will be: Sn = n*const
.
When all subqueries are matched (n=N
): Smax = N*const
Partial matches compared to full match will be part_of_max = Sn / Smax = (n*const) / (N*const) = n/N
For example if you have 3 subqueries:
Smax
part_2 = 2/3=0.66
(66%) of Smax
.part_1 = 1/3=0.33
(33%) of Smax
Let's compare this to scoring when coordination is enabled (default behaviour of elasticsearch). Long story short: "partial" matches will have much worse score then full matches.
Approximate score will be proportional to sum of weights/boosts of matched subqueries multiplied by n/N
. And if boosts/weights are equal then total score will be proportional to Sn₂ = n*n/N * const = n²/N * const
When all subqueries are matched (n=N
): Smax₂ = N*(N/N)*const = N * const
Partial matches compared to full match will be part_of_max₂ = Sn₂ / Smax₂ = (n²/N * const) / (N * const) = n²/N²
For example if you have 3 subqueries:
Smax
the same as when coordination is enabledpart_2₂ = 4/9=0.44
(44%) of Smax₂
. Or 2/3 smaller (66%) compared to part_2
part_1₂ = 1/9=0.11
(11%) of Smax₂
. Or 1/3 smaller (33%) compared to part_1
Different coordination approaches compared to each other: scores when disable_coord=False
are smaller than scores when disable_coord=True
by (n²/N²)/(n/N) = n/N
times
Possible usecases for different query types with different coordination policy:
Note that same subquery may have different score when term appears several times in the document: this is controlled by term_frequency (https://www.elastic.co/guide/en/elasticsearch/guide/current/scoring-theory.html#tf) -- and it's not affected by disable_coord
contary to what's said in another answer (https://stackoverflow.com/a/26998760/952437). Field-length normalization also affects how results are calculated
If you'd like to know how these 3 concepts work together then see following example:
Query: quick brown fox
-- this is actually 3 queries combined with "OR"
disable_coord=True:
quick brown fox rocks
-- Score of ~=3*1/(sqrt(4))*const = 3*tmp_const
quick brown fox quick
-- Score of ~=(1+1*sqrt(2)+1)*1/(sqrt(4))*const = 3.41 * tmp_const
quick brown fox quick fox
-- Score of ~=(1+1*sqrt(2)+1*sqrt(2))*1/(sqrt(5))*const = 3.82 * 0.89 tmp_const = 3.42 * tmp_const
. One extra fox makes result more relevant but this is compensated by field-length-normalizationquick brown bird flies
-- Score of ~=2*1/(sqrt(4))*const = 2*tmp_const
quick brown bird
-- Score of ~=2*1/(sqrt(3))*const = 2*1.1547*tmp_const ~= 2.31*tmp_const
fox
-- Score of ~=2*1/(sqrt(1))*const = 2*2*tmp_const ~= 4*tmp_const
-- score is bigger even compared to quick brown fox quick
. This is caused by field length normalizationdisable_coord=False:
~=3*1/(sqrt(4))*const = 3*tmp_const
~=(1+1*sqrt(2)+1)*1/(sqrt(4))*const = 3.41 * tmp_const
~=(1+1*sqrt(2)+1*sqrt(2))*1/(sqrt(5))*const = 3.82 * 0.89 tmp_const = 3.42 * tmp_const
~=2*1/(sqrt(4))*const * 2/3 = 1.33*tmp_const
. Lower score thanks to coordination~=2*1/(sqrt(3))*const *2/3 = 2*1.1547*tmp_const * 2/3 ~= 1.54*tmp_const
. . Lower score thanks to coordinationfox
(coord_factor=1/3=0.33) -- Score of ~=2*1/(sqrt(1))*const * 1/3 = 2*2*tmp_const * 1/3 ~= 1.33*tmp_const
. Thanks to "coordination" this result is now less relevant than result with all 3 termsReal score will also depend on IDF (inversed document frequency). Examples above assume that all the terms have the same frequency in the index.
Its is used in the lucene scoring. While scoring the results,
Example If i like to modify the coord score of any bool query such that the entire query will be multiplied by 2 if some particular clause or text or values are matched.
This is coordination factor.
if coord factor is enabled (by default "disable_coord": false) then it means: if we have more search keywords in text then this result would be more relevant and will get higher score.
if coord factor is disabled("disable_coord": true) then it means: no matter how many keywords we have in search text it will be counted just once.
More details you can find here.
Suppose you have a should clause in which you have three queries now suppose one document matches first bool query then it will get some score on that but suppose this query do not exactly match second query but partially matches, now this document will be given some little bit score extra that means (first query match score + second query partial match score).
Now if u do not want this partial score to be given in your query then you should write "disable_coord": true what it will do it will only give score to the document according to the exactly match query not on the partial match. I hope you get it now.........:)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With