Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python filter/remove URLs from a list

I have a text file of URLs, about 14000. Below is a couple of examples:

http://www.domainname.com/pagename?CONTENT_ITEM_ID=100&param2=123
http://www.domainname.com/images?IMAGE_ID=10
http://www.domainname.com/pagename?CONTENT_ITEM_ID=101&param2=123
http://www.domainname.com/images?IMAGE_ID=11
http://www.domainname.com/pagename?CONTENT_ITEM_ID=102&param2=123

I have loaded the text file into a Python list and I am trying to get all the URLs with CONTENT_ITEM_ID separated off into a list of their own. What would be the best way to do this in Python?

Cheers

like image 738
RailsSon Avatar asked Nov 03 '08 11:11

RailsSon


People also ask

How do I remove a URL from a list in Python?

The re. sub() function provides the most straightforward approach to remove URLs from text in Python. This function is used to substitute a given substring with another substring in any provided string. It uses a regex pattern to find the substring and then replace it with the provided substring.

How do you remove a link from a string in Python?

sub() method to remove URLs from text, e.g. result = re. sub(r'http\S+', '', my_string) . The re. sub() method will remove any URLs from the string by replacing them with empty strings.

How do I remove a URL from a string?

To remove a hyperlink but keep the text, right-click the hyperlink and click Remove Hyperlink. To remove the hyperlink completely, select it and then press Delete.


1 Answers

Here's another alternative to Graeme's, using the newer list comprehension syntax:

list2= [line for line in file if 'CONTENT_ITEM_ID' in line]

Which you prefer is a matter of taste!

like image 124
bobince Avatar answered Oct 17 '22 17:10

bobince