Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Classify words to "good" and "bad"

I have a list of domain names and want to determine is name of domain looks like it is porno site or not. What the better way to do this? List of porn domains looks like http://dumpz.org/56957/ . This domains can be used to teach the system how porno domains should look like. Also I have other list - http://dumpz.org/56960/ - many domains of this list also is porno and I want to determine them by name.

like image 631
Mykola Kharechko Avatar asked Dec 17 '22 13:12

Mykola Kharechko


1 Answers

Use a bayesian filter eg: SpamBayes or Divmods Reverend. You train it with the list you have and could score how likely it is for a given domain, if it is porn.

For a short overview look at this article.

like image 96
jazz Avatar answered Jan 02 '23 15:01

jazz