Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fastest way to check does string contain any word from list

I have Python application.

There is list of 450 prohibited phrases. There is message got from user. I want to check, does this message contain any of this prohibited pharases. What is the fastest way to do that?

Currently I have this code:

message = "sometext"
lista = ["a","b","c"]

isContaining = false

for a, member in enumerate(lista):
 if message.contains(lista[a]):
  isContaining = true
  break

Is there any faster way to do that? I need to handle message (max 500 chars) in less than 1 second.

like image 315
TN888 Avatar asked Jan 05 '15 14:01

TN888


2 Answers

There is the any built-in function specially for that:

>>> message = "sometext"
>>> lista = ["a","b","c"]
>>> any(a in message for a in lista)
False
>>> lista = ["a","b","e"]
>>> any(a in message for a in lista)
True

Alternatively you could check the intersection of the sets:

>>> lista = ["a","b","c"]
>>> set(message) & set(lista)
set([])
>>> lista = ["a","b","e"]
>>> set(message) & set(lista)
set(['e'])
>>> set(['test','sentence'])&set(['this','is','my','sentence'])
set(['sentence'])

But you won't be able to check for subwords:

>>> set(['test','sentence'])&set(['this is my sentence'])
like image 58
fredtantini Avatar answered Sep 19 '22 12:09

fredtantini


Using regex compile from list

Consider memory and building time or expression, compile in advance.

lista = [...]
lista_escaped = [re.escape(item) for item in lista]
bad_match = re.compile('|'.join(lista_escaped))
is_bad = bad_match.search(message, re.IGNORECASE)
like image 29
shevski Avatar answered Sep 20 '22 12:09

shevski