How do i truncate the below URL next to the domain "com" using python. i.e you tube.com only
youtube.com/video/AiL6nL
yahoo.com/video/Hhj9B2
youtube.com/video/MpVHQ
google.com/video/PGuTN
youtube.com/video/VU34MI
Is it possible to truncate like this?
Check out Pythons urlparse
library. It is a standard library so nothing else needs to be installed.
So you could do the following:
import urlparse
import re
def check_and_add_http(url):
# checks if 'http://' is present at the start of the URL and adds it if not.
http_regex = re.compile(r'^http[s]?://')
if http_regex.match(url):
# 'http://' or 'https://' is present
return url
else:
# add 'http://' for urlparse to work.
return 'http://' + url
for url in url_list:
url = check_and_add_http(url)
print(urlparse.urlsplit(url)[1])
You can read more about urlsplit()
in the documentation, including the indexes if you want to read the other parts of the URL.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With