Calculating IDF (as in TF-IDF) when testing?

Question

As I understand it, IDF is used to calculate how many documents have the term (sort of just the idea). You can calculate IDF (along with TF) in the training set since you have all the documents beforehand. But what if I don't have the test set beforehand and I'm getting test documents in a sequential manner (like from a web crawler), then how am I going to calculate the IDF for words in a document when it comes to testing?

MRFS · Accepted Answer

For this state if your dataset is big enough you could using just training set for IDF. in the test phase if the new term be in train set use the IDF of training and if the term is new use the number of train set documents for calculate IDF. For some purposes you could use smoothing methods for having better results.

Calculating IDF (as in TF-IDF) when testing?

Tags:

text

classification

information-retrieval

tf-idf

samsamara

1 Answers

MRFS

Recent Activity

Donate For Us

Calculating IDF (as in TF-IDF) when testing?

Tags:

text

classification

information-retrieval

tf-idf

samsamara

1 Answers

MRFS

Related questions

Recent Activity

Donate For Us