I was reading through this article and it said that
Note that IDF is dependent on the query term (T) and the database as a whole. In particular, it does not vary from document to document. Therefore, IDF will have no effect on 1-word queries.
I don't quite get this. If TF-IDF(T) = TF * log(N/dbCount[T])
why doesn't it have effect on a 1 word query?
For a given corpus of words, each words IDF will remain constant. What does it mean that the ranking takes no effect for a given single word as the query? - Since the already calculated IDF is known for every term, when a single word query hits the system, the search system simply responds with a 'sorted' list with the IDF value acting more like a scalar function (co-efficient) making it a linear function.
However, when two terms (or more) are sent as a query to the search system, this is when a real ranking comes into play ie:- each query term now starts to influence the results making the results as a non-linear function.
Hope this clarifies to many like me :-)
To understand this lets understand what TF-IDF actually achieves. Say we have N documents D1, D2, D3.........DN. we want to assign a TF-iDF score to each of these document and then the document with highest TF-IDF score is the most relevant search followed by the document with second highest TF-IDF score. Now IDF is just dependent on the term of query and on entire corpus. so its value is a constant for all documents (log(N/dbCount[T]) N and dbCount[T] are oth not dependent on document . it will be same for D1, D2, D3.. DN. So each of the TF-TDF score of document will scale up/down by that constant, which is same for all documents. In effect the relative -ranking will not change. Hene for one term you can actually skip it
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With