There is a list of banned words ( or strings to be more general) and another list with let's say users mails. I would like to excise all banned words from all mails.
Trivial example:
foreach(string word in wordsList)
{
foreach(string mail in mailList)
{
mail.Replace(word,String.Empty);
}
}
How I can improve this algorithm?
Thanks for advices. I voted few answers up but I didn't mark any as answer since it was more like discussion than solution. Some people missed banned words with bad words. In my case I don't have to bother about recognize 'sh1t' or something like that.
Simple approaches to profanity filtering won't work - complex approaches don't work, for the most part, either.
What happens when you get a work like 'password' and you want to filter out 'ass'? What happens when some clever person writes 'a$$' instead - the intent is still clear, right?
See How do you implement a good profanity filter? for extensive discussion.
You could use RegEx to make things a little cleaner:
var bannedWords = @"\b(this|is|the|list|of|banned|words)\b";
foreach(mail in mailList)
var clean = Regex.Replace(mail, bannedWords, "", RegexOptions.IgnoreCase);
Even that, though, is far from perfect since people will always figure out a way around any type of filter.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With