Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Removing element from a list by a regexp in Python

Tags:

python

regex

I am trying to remove a string that is in parentheses from a list in Python without success.

See following code:

full = ['webb', 'ellis', '(sportswear)']
regex = re.compile(r'\b\(.*\)\b')
filtered = [i for i in full if not regex.search(i)]

Returns:

['webb', 'ellis', '(sportswear)']

Could somebody point out my mistake?

like image 360
Istvan Avatar asked May 12 '16 12:05

Istvan


People also ask

How do I remove a specific element from a list in Python?

How to Remove an Element from a List Using the remove() Method in Python. To remove an element from a list using the remove() method, specify the value of that element and pass it as an argument to the method. remove() will search the list to find it and remove it.

How do I remove unwanted values from a list in Python?

You can make use of list methods like remove(), pop() to remove the first element from the list. In the case of remove() method, you will have to pass the first element to be removed and for pop the index, i.e., 0. You may also use the del keyword to remove the first element from the list.

How do I remove a specific element from a list?

clear() , pop() , and remove() are methods of list . You can also remove elements from a list with del statements. Specify the item to be deleted by index. The first index is 0 , and the last is -1 .


3 Answers

The \b word boundary makes it impossible to match ( at the beginning of a string since there is no word there (i.e. \b requires a letter, digit or underscore to be right before ( in your pattern, and that is not the case).

As you confirm you need to match values that are fully enclosed with (...), you need regex = re.compile(r'\(.*\)$') with re.match.

Use

import re
full = ['webb', 'ellis', '(sportswear)']
regex = re.compile(r'\(.*\)$')
filtered = [i for i in full if not regex.match(i)]
print(filtered)

See the IDEONE demo

The re.match will anchor the match at the start of the string, and the $ will anchor the match at the end of the string.

Note that if your string has newlines in it, use flags=re.DOTALL when compiling the regex (so that . could also match newline symbols, too).

like image 90
Wiktor Stribiżew Avatar answered Oct 26 '22 07:10

Wiktor Stribiżew


>>> import re
>>> full = ['webb', 'ellis', '(sportswear)']
>>> x = filter(None, [re.sub(r".*\(.*\).*", r"", i) for i in full])
>>> x
['webb', 'ellis']
like image 31
Mayur Koshti Avatar answered Oct 26 '22 07:10

Mayur Koshti


For my use case, this worked. Maybe it would be useful for someone finding the same problem

doc_list = dir(obj)
regex = re.compile(r'^__\w*__$')
filtered = [ele for ele in doc_list if not regex.match(ele)]
like image 40
Sudip Kandel Avatar answered Oct 26 '22 08:10

Sudip Kandel