Given a query and a term, how could I calculate the average position of the term within every document in the query and return it? I am looking for the fastest (performance wise) solution and willing to extend the solr functionality.
Following that, I would need to calculate the average position of a term accross all documents in the query. With that, I do not need to return the documents themesleves to the client - just the average term position.
Thanks Saar
One of the solutions is to do the following (QUITE A LOT OF CODING - I'm not aware of a shortcut as you need to traverse term positions within documents. There is no built-in functionality to do so via functions, but you also may think of using Payloads somehow).
Perhaps another option is to alter the indexing logic and calculate those averages in analysis stage. If you manage to do so (putting it into payload), you can fetch this information much faster in query time, but it means developing a sophisticated analysis filter.
If I understand you correctly, you would like to compute arithmetic mean of all positions of a term in the document-set returned for a particular query.
Here's what I could come up with.
First of all, you must enable positional information while indexing to extract any positional info from the index.
Take a look at this component: The Term Vector Component
The response would contain what you would need to compute arithmetic mean.
Please do not forget to specify the term you are looking for in the query. For example: q:(field1:someExQueryIfNeeded AND field2:targetTerm)
Make sure that you retrieve minimal stuff you need. If you end up receiving a lot of noise, you can always customize this component as a Solr Plugin and return only the info you need.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With