Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python re infinite execution

Tags:

I'm trying to execute this code :

import re pattern = r"(\w+)\*([\w\s]+)*/$" re_compiled = re.compile(pattern) results = re_compiled.search('COPRO*HORIZON 2000                 HOR') print(results.groups()) 

But Python does not respond. The process takes 100% of the CPU and does not stop. I've tried this both on Python 2.7.1 and Python 3.2 with identical results.

like image 831
wonzbak Avatar asked Nov 04 '11 13:11

wonzbak


1 Answers

Your regex runs into catastrophic backtracking because you have nested quantifiers (([...]+)*). Since your regex requires the string to end in / (which fails on your example), the regex engine tries all permutations of the string in the vain hope to find a matching combination. That's where it gets stuck.

To illustrate, let's assume "A*BCD" as the input to your regex and see what happens:

  1. (\w+) matches A. Good.
  2. \* matches *. Yay.
  3. [\w\s]+ matches BCD. OK.
  4. / fails to match (no characters left to match). OK, let's back up one character.
  5. / fails to match D. Hum. Let's back up some more.
  6. [\w\s]+ matches BC, and the repeated [\w\s]+ matches D.
  7. / fails to match. Back up.
  8. / fails to match D. Back up some more.
  9. [\w\s]+ matches B, and the repeated [\w\s]+ matches CD.
  10. / fails to match. Back up again.
  11. / fails to match D. Back up some more, again.
  12. How about [\w\s]+ matches B, repeated [\w\s]+ matches C, repeated [\w\s]+ matches D? No? Let's try something else.
  13. [\w\s]+ matches BC. Let's stop here and see what happens.
  14. Darn, / still doesn't match D.
  15. [\w\s]+ matches B.
  16. Still no luck. / doesn't match C.
  17. Hey, the whole group is optional (...)*.
  18. Nope, / still doesn't match B.
  19. OK, I give up.

Now that was a string of just three letters. Yours had about 30, trying all permutations of which would keep your computer busy until the end of days.

I suppose what you're trying to do is to get the strings before/after *, in which case, use

pattern = r"(\w+)\*([\w\s]+)$" 
like image 109
Tim Pietzcker Avatar answered Oct 07 '22 12:10

Tim Pietzcker