Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Removing HTTP and WWW from URL python

Tags:

python

url

url1='www.google.com'
url2='http://www.google.com'
url3='http://google.com'
url4='www.google'
url5='http://www.google.com/images'
url6='https://www.youtube.com/watch?v=6RB89BOxaYY

How to strip http(s) and www from url in Python?

like image 202
guri Avatar asked Nov 17 '16 08:11

guri


People also ask

How do you strip a URL in Python?

The re. sub() function provides the most straightforward approach to remove URLs from text in Python. This function is used to substitute a given substring with another substring in any provided string. It uses a regex pattern to find the substring and then replace it with the provided substring.

How do I remove text from URL in Python?

sub() method to remove URLs from text, e.g. result = re. sub(r'http\S+', '', my_string) . The re. sub() method will remove any URLs from the string by replacing them with empty strings.

How do you change the URL in Python?

The replace_urls() method in Python replaces all the URLs in a given text with the replacement string.


2 Answers

A more elegant solution would be using urlparse:

from urllib.parse import urlparse

def get_hostname(url, uri_type='both'):
    """Get the host name from the url"""
    parsed_uri = urlparse(url)
    if uri_type == 'both':
        return '{uri.scheme}://{uri.netloc}/'.format(uri=parsed_uri)
    elif uri_type == 'netloc_only':
        return '{uri.netloc}'.format(uri=parsed_uri)

The first option includes https or http, depending on the link, and the second part netloc includes what you were looking for.

like image 65
JohnAndrews Avatar answered Sep 20 '22 14:09

JohnAndrews


You can use the string method replace:

url = 'http://www.google.com/images'
url = url.replace("http://www.","")

or you can use regular expressions:

import re

url = re.compile(r"https?://(www\.)?")
url = url.sub('', 'http://www.google.com/images').strip().strip('/')
like image 42
Januka samaranyake Avatar answered Sep 18 '22 14:09

Januka samaranyake