Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

max_df corresponds to documents than min_df error in Ridge classifier

I trained the ridge classifier with a huge amount of data ,used tfidf vecotrizer to vectorize data and it used to work fine. But now i am facing an error

'max_df corresponds to < documents than min_df'

The data is stored in Mongodb.
I tried various option to solve it and and finally when i deleted a collection in Mongodb which had only 1 document (1 record), it worked normally and completed the training as usual.

But I need a solution which does not require deleting the record as I need that record.

Also, I am not understanding the error as it is only in my machine.The script used to work fine before in my system even while this record was present in the db.The script is working fine in other system as well.

Could someone help please?

like image 387
athi_nn Avatar asked Oct 03 '16 09:10

athi_nn


Video Answer


1 Answers

That error is telling you that your max_df value is less than the min_df value. For example:

max_df = 0.7 # Removes terms with DF higher than the 70% of the documents

min_df = 5 # Terms must have DF >= 5 to be considered

and suppose that the total number of documents in your corpus is 7, so max_df now is 0.7*7 = 4.9 and min_df still is 5, then max_df < min_df, and that should never happen because that means that 0 terms will be considered; never a term has DF lower than 4.9 and higher than 5.

like image 78
Andres Avatar answered Sep 24 '22 15:09

Andres