Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What’s a good Python profanity filter library? [closed]

Like https://stackoverflow.com/questions/1521646/best-profanity-filter, but for Python — and I’m looking for libraries I can run and control myself locally, as opposed to web services.

(And whilst it’s always great to hear your fundamental objections of principle to profanity filtering, I’m not specifically looking for them here. I know profanity filtering can’t pick up every hurtful thing being said. I know swearing, in the grand scheme of things, isn’t a particularly big issue. I know you need some human input to deal with issues of content. I’d just like to find a good library, and see what use I can make of it.)

like image 935
Paul D. Waite Avatar asked Aug 20 '10 14:08

Paul D. Waite


People also ask

How does Netflix profanity filter work?

The Netflix Profanity Filter is a browser plugin, like the Advanced Profanity Filter. It works in the same way too. They source the subtitles of a show or movie, and censor words in the text. The moment that censored text syncs up with the audio of the content, they also mute that.

How can I watch movies without cursing?

UFIlter is an incredible new tool that lets you watch content from all of the major streaming services without all of the offensive words! For the price of just one movie, you get the ability to filter the language out of over 100,000 movies and TV episodes!

What are profanity filters?

A profanity filter is a type of software that scans user-generated content (UGC) to filter out profanity within online communities, social platforms, marketplaces, and more. Moderators decide on which words to censor, including swear words, words associated with hate speech, harassment, etc.

What is Facebook profanity filter?

If you help manage a Facebook Page, you may use the profanity filter to hide comments with profanity from your Page. We determine what to hide by using the most commonly reported words and phrases marked offensive by the community.


1 Answers

I didn't found any Python profanity library, so I made one myself.

Parameters


filterlist

A list of regular expressions that match a forbidden word. Please do not use \b, it will be inserted depending on inside_words.

Example: ['bad', 'un\w+']

ignore_case

Default: True

Self-explanatory.

replacements

Default: "$@%-?!"

A string with characters from which the replacements strings will be randomly generated.

Examples: "%&$?!" or "-" etc.

complete

Default: True

Controls if the entire string will be replaced or if the first and last chars will be kept.

inside_words

Default: False

Controls if words are searched inside other words too. Disabling this

Module source


(examples at the end)

""" Module that provides a class that filters profanities  """  __author__ = "leoluk" __version__ = '0.0.1'  import random import re  class ProfanitiesFilter(object):     def __init__(self, filterlist, ignore_case=True, replacements="$@%-?!",                   complete=True, inside_words=False):         """         Inits the profanity filter.          filterlist -- a list of regular expressions that         matches words that are forbidden         ignore_case -- ignore capitalization         replacements -- string with characters to replace the forbidden word         complete -- completely remove the word or keep the first and last char?         inside_words -- search inside other words?          """          self.badwords = filterlist         self.ignore_case = ignore_case         self.replacements = replacements         self.complete = complete         self.inside_words = inside_words      def _make_clean_word(self, length):         """         Generates a random replacement string of a given length         using the chars in self.replacements.          """         return ''.join([random.choice(self.replacements) for i in                   range(length)])      def __replacer(self, match):         value = match.group()         if self.complete:             return self._make_clean_word(len(value))         else:             return value[0]+self._make_clean_word(len(value)-2)+value[-1]      def clean(self, text):         """Cleans a string from profanity."""          regexp_insidewords = {             True: r'(%s)',             False: r'\b(%s)\b',             }          regexp = (regexp_insidewords[self.inside_words] %                    '|'.join(self.badwords))          r = re.compile(regexp, re.IGNORECASE if self.ignore_case else 0)          return r.sub(self.__replacer, text)   if __name__ == '__main__':      f = ProfanitiesFilter(['bad', 'un\w+'], replacements="-")         example = "I am doing bad ungood badlike things."      print f.clean(example)     # Returns "I am doing --- ------ badlike things."      f.inside_words = True         print f.clean(example)     # Returns "I am doing --- ------ ---like things."      f.complete = False         print f.clean(example)     # Returns "I am doing b-d u----d b-dlike things." 
like image 166
leoluk Avatar answered Oct 13 '22 17:10

leoluk