Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

idf has no effect on ranking one term queries

I was reading through this article and it said that

Note that IDF is dependent on the query term (T) and the database as a whole. In particular, it does not vary from document to document. Therefore, IDF will have no effect on 1-word queries.

I don't quite get this. If TF-IDF(T) = TF * log(N/dbCount[T]) why doesn't it have effect on a 1 word query?

like image 745
Bill Cheng Avatar asked Feb 26 '16 16:02

Bill Cheng


2 Answers

For a given corpus of words, each words IDF will remain constant. What does it mean that the ranking takes no effect for a given single word as the query? - Since the already calculated IDF is known for every term, when a single word query hits the system, the search system simply responds with a 'sorted' list with the IDF value acting more like a scalar function (co-efficient) making it a linear function.

However, when two terms (or more) are sent as a query to the search system, this is when a real ranking comes into play ie:- each query term now starts to influence the results making the results as a non-linear function.

Hope this clarifies to many like me :-)

like image 90
zorze Avatar answered Sep 23 '22 14:09

zorze


To understand this lets understand what TF-IDF actually achieves. Say we have N documents D1, D2, D3.........DN. we want to assign a TF-iDF score to each of these document and then the document with highest TF-IDF score is the most relevant search followed by the document with second highest TF-IDF score. Now IDF is just dependent on the term of query and on entire corpus. so its value is a constant for all documents (log(N/dbCount[T]) N and dbCount[T] are oth not dependent on document . it will be same for D1, D2, D3.. DN. So each of the TF-TDF score of document will scale up/down by that constant, which is same for all documents. In effect the relative -ranking will not change. Hene for one term you can actually skip it

like image 34
Amit Priyadarshi Avatar answered Sep 24 '22 14:09

Amit Priyadarshi