I am creating a search engine ( for studying ) and I want to know how Google recognizes adult content and images with Safesearch ( http://en.wikipedia.org/wiki/Safesearch ).
The program language doesn't matter, I want to know only the approach for a generic program language.
If the rules for any sort of content filter fell into the hands of people trying to get that content through the filter, the filter would become ineffective.
So I imagine that Google's rules (1) are not publicly available and (2) change frequently.
That said, starting with a small blacklist of adult sites and following outgoing links (and/or finding sites with links to the blacklisted sites) probably finds a huge number of adult sites. But by no means all, you'd want some sort of text processing and image recognition algorithms in addition.
NOTE: A popular theory is that adult content providers pay people to ask questions on stackoverflow.com so that Jon Skeet and Marc Gravell will have less time to update the SafeSearch filters. However, it is easily shown that Jon and Marc answer questions at such a high rate that any such strategy would not be economically viable.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With