I have a text file of URLs, about 14000. Below is a couple of examples:
http://www.domainname.com/pagename?CONTENT_ITEM_ID=100¶m2=123
http://www.domainname.com/images?IMAGE_ID=10
http://www.domainname.com/pagename?CONTENT_ITEM_ID=101¶m2=123
http://www.domainname.com/images?IMAGE_ID=11
http://www.domainname.com/pagename?CONTENT_ITEM_ID=102¶m2=123
I have loaded the text file into a Python list and I am trying to get all the URLs with CONTENT_ITEM_ID separated off into a list of their own. What would be the best way to do this in Python?
Cheers
The re. sub() function provides the most straightforward approach to remove URLs from text in Python. This function is used to substitute a given substring with another substring in any provided string. It uses a regex pattern to find the substring and then replace it with the provided substring.
sub() method to remove URLs from text, e.g. result = re. sub(r'http\S+', '', my_string) . The re. sub() method will remove any URLs from the string by replacing them with empty strings.
To remove a hyperlink but keep the text, right-click the hyperlink and click Remove Hyperlink. To remove the hyperlink completely, select it and then press Delete.
Here's another alternative to Graeme's, using the newer list comprehension syntax:
list2= [line for line in file if 'CONTENT_ITEM_ID' in line]
Which you prefer is a matter of taste!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With