Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove duplicate words using regex in Python

Tags:

python

regex

I need to remove repetitive words in string so that 'the (the)' will become 'the'. Why can't I do it as follows?

re.sub('(.+) \(\1\)', '\1', 'the (the)')

Thanks.

like image 907
jackhab Avatar asked Feb 24 '23 17:02

jackhab


1 Answers

You need to doubly escape the back-reference:

re.sub('(.+) \(\\1\)', '\\1', 'the (the)')
--> the

Or use the r prefix:

When an "r" or "R" prefix is present, a character following a backslash is included in the string without change, and all backslashes are left in the string.

re.sub(r'(.+) \(\1\)', r'\1', 'the (the)')
--> the
like image 154
jensgram Avatar answered Feb 27 '23 05:02

jensgram