Since Django doesn't handle filtering profanities - does anyone have any suggestions on an easy way to implement some sort of natural language processing / filtering of profanities in django?
Django does handle filtering profanities.
From https://docs.djangoproject.com/en/1.4/ref/settings/#profanities-list:
PROFANITIES_LIST
Default: () (Empty tuple)
A tuple of profanities, as strings, that will be forbidden in comments when
COMMENTS_ALLOW_PROFANITIES
isFalse
.
That said you'll still need to populate that list. Some links to get started.
I would also familiarize yourself with the Scunthorpe problem.
Personally I say... don't bother. If you create better filters, they will simply type it differently...
But, here's a simple example:
import re
bad_words = ['spam', 'eggs']
# The \b gives a word boundary so you don't have the Scunthorpe problem: http://en.wikipedia.org/wiki/Scunthorpe_problem
pattern = re.compile(
r'\b(%s)\b' % '|'.join(bad_words),
re.IGNORECASE,
)
some_text = 'This text contains some profane words like spam and eggs. But it wont match spammy stuff.'
print some_text
# This text contains some profane words like spam and eggs. But it wont match spammy stuff.
clean_text = pattern.sub('XXX', some_text)
print clean_text
# This text contains some profane words like XXX and XXX. But it wont match spammy stuff.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With