Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In Python, how do I remove from a list any element containing certain kinds of characters?

Tags:

Apologies if this is a simple question, I'm still pretty new to this, but I've spent a while looking for an answer and haven't found anything. I have a list that looks something like this horrifying mess:

['Organization name} ', '> (777) 777-7777} ', ' class="lsn-mB6 adr">1 Address, MA 02114 } ', ' class="lsn-serpListRadius lsn-fr">.2 Miles} MORE INFO YOUR LISTING MAP if (typeof(serps) !== \'undefined\') serps.arrArticleIds.push(\'4603114\'); ', 'Other organization} ', '> (555) 555-5555} ', ' class="lsn-mB6 adr">301 Address, MA 02121 } ', ' class="lsn-serpListRadius lsn-fr">.2 Miles} MORE INFO CLAIM YOUR LISTING MAP if (typeof(serps) !== \'undefined\') serps.arrArticleIds.push(\'4715945\'); ', 'Organization} '] 

And I need to process it so that HTML.py can turn the information in it into a table. For some reason, HTML.py simply can't handle the monster elements (eg. 'class="lsn-serpListRadius lsn-fr">.2 Miles} MORE INFO YOUR LISTING MAP if (typeof(serps) !== \'undefined\') serps.arrArticleIds.push(\'4603114\'); ', etc). Fortunately for me, I don't actually care about the information in the monster elements and want to get rid of them.

I tried writing a regex that would match all more-than-two-letter all-caps words, to identify the monster elements, and got this:

re.compile('[^a-z]*[A-Z][^a-z]*\w{3,}') 

But I don't know how to apply that to deleting the elements containing matches to that regex from the list. How would I do that/is that the right way to go about it?

like image 219
RSid Avatar asked Aug 10 '11 16:08

RSid


People also ask

Which method is use for remove a specific item from the list?

Method 2: Using remove() We can remove an item from the list by passing the value of the item to be deleted as the parameter to remove() function.


1 Answers

I think your regex is incorrect, to match all entries that contain all-cap words with three or more characters, you should use something like this with re.search:

regex = re.compile(r'\b[A-Z]{3,}\b') 

With that you can filter using a list comprehension or the filter built-in function:

full = ['Organization name} ', '> (777) 777-7777} ', ' class="lsn-mB6 adr">1 Address, MA 02114 } ', ' class="lsn-serpListRadius lsn-fr">.2 Miles} MORE INFO YOUR LISTING MAP if (typeof(serps) !== \'undefined\') serps.arrArticleIds.push(\'4603114\'); ', 'Other organization} ', '> (555) 555-5555} ', ' class="lsn-mB6 adr">301 Address, MA 02121 } ', ' class="lsn-serpListRadius lsn-fr">.2 Miles} MORE INFO CLAIM YOUR LISTING MAP if (typeof(serps) !== \'undefined\') serps.arrArticleIds.push(\'4715945\'); ', 'Organization} '] regex = re.compile(r'\b[A-Z]{3,}\b') # use only one of the following lines, whichever you prefer filtered = filter(lambda i: not regex.search(i), full) filtered = [i for i in full if not regex.search(i)] 

Results in the following list (which I think is what you are looking for:

>>> pprint.pprint(filtered) ['Organization name} ',  '> (777) 777-7777} ',  ' class="lsn-mB6 adr">1 Address, MA 02114 } ',  'Other organization} ',  '> (555) 555-5555} ',  ' class="lsn-mB6 adr">301 Address, MA 02121 } ',  'Organization} '] 
like image 117
Andrew Clark Avatar answered Oct 13 '22 12:10

Andrew Clark