Like https://stackoverflow.com/questions/1521646/best-profanity-filter, but for Python — and I’m looking for libraries I can run and control myself locally, as opposed to web services.
(And whilst it’s always great to hear your fundamental objections of principle to profanity filtering, I’m not specifically looking for them here. I know profanity filtering can’t pick up every hurtful thing being said. I know swearing, in the grand scheme of things, isn’t a particularly big issue. I know you need some human input to deal with issues of content. I’d just like to find a good library, and see what use I can make of it.)
The Netflix Profanity Filter is a browser plugin, like the Advanced Profanity Filter. It works in the same way too. They source the subtitles of a show or movie, and censor words in the text. The moment that censored text syncs up with the audio of the content, they also mute that.
UFIlter is an incredible new tool that lets you watch content from all of the major streaming services without all of the offensive words! For the price of just one movie, you get the ability to filter the language out of over 100,000 movies and TV episodes!
A profanity filter is a type of software that scans user-generated content (UGC) to filter out profanity within online communities, social platforms, marketplaces, and more. Moderators decide on which words to censor, including swear words, words associated with hate speech, harassment, etc.
If you help manage a Facebook Page, you may use the profanity filter to hide comments with profanity from your Page. We determine what to hide by using the most commonly reported words and phrases marked offensive by the community.
I didn't found any Python profanity library, so I made one myself.
filterlist
A list of regular expressions that match a forbidden word. Please do not use \b
, it will be inserted depending on inside_words
.
Example: ['bad', 'un\w+']
ignore_case
Default: True
Self-explanatory.
replacements
Default: "$@%-?!"
A string with characters from which the replacements strings will be randomly generated.
Examples: "%&$?!"
or "-"
etc.
complete
Default: True
Controls if the entire string will be replaced or if the first and last chars will be kept.
inside_words
Default: False
Controls if words are searched inside other words too. Disabling this
(examples at the end)
""" Module that provides a class that filters profanities """ __author__ = "leoluk" __version__ = '0.0.1' import random import re class ProfanitiesFilter(object): def __init__(self, filterlist, ignore_case=True, replacements="$@%-?!", complete=True, inside_words=False): """ Inits the profanity filter. filterlist -- a list of regular expressions that matches words that are forbidden ignore_case -- ignore capitalization replacements -- string with characters to replace the forbidden word complete -- completely remove the word or keep the first and last char? inside_words -- search inside other words? """ self.badwords = filterlist self.ignore_case = ignore_case self.replacements = replacements self.complete = complete self.inside_words = inside_words def _make_clean_word(self, length): """ Generates a random replacement string of a given length using the chars in self.replacements. """ return ''.join([random.choice(self.replacements) for i in range(length)]) def __replacer(self, match): value = match.group() if self.complete: return self._make_clean_word(len(value)) else: return value[0]+self._make_clean_word(len(value)-2)+value[-1] def clean(self, text): """Cleans a string from profanity.""" regexp_insidewords = { True: r'(%s)', False: r'\b(%s)\b', } regexp = (regexp_insidewords[self.inside_words] % '|'.join(self.badwords)) r = re.compile(regexp, re.IGNORECASE if self.ignore_case else 0) return r.sub(self.__replacer, text) if __name__ == '__main__': f = ProfanitiesFilter(['bad', 'un\w+'], replacements="-") example = "I am doing bad ungood badlike things." print f.clean(example) # Returns "I am doing --- ------ badlike things." f.inside_words = True print f.clean(example) # Returns "I am doing --- ------ ---like things." f.complete = False print f.clean(example) # Returns "I am doing b-d u----d b-dlike things."
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With