Is there a cleaner way to modify some parts of a URL in Python 2?
For example
http://foo/bar -> http://foo/yah
At present, I'm doing this:
import urlparse
url = 'http://foo/bar'
# Modify path component of URL from 'bar' to 'yah'
# Use nasty convert-to-list hack due to urlparse.ParseResult being immutable
parts = list(urlparse.urlparse(url))
parts[2] = 'yah'
url = urlparse.urlunparse(parts)
Is there a cleaner solution?
newurl = url. replace('/f/','/d/').
sub() method to remove URLs from text, e.g. result = re. sub(r'http\S+', '', my_string) . The re. sub() method will remove any URLs from the string by replacing them with empty strings.
Use the urljoin method from the urllib. parse module to join a base URL with another URLs, e.g. result = urljoin(base_url, path) . The urljoin method constructs a full (absolute) URL by combining a base URL with another URL.
The requests module can help us build the URLS and manipulate the URL value dynamically. Any sub-directory of the URL can be fetched programmatically and then some part of it can be substituted with new values to build new URLs.
Unfortunately, the documentation is out of date; the results produced by urlparse.urlparse()
(and urlparse.urlsplit()
) use a collections.namedtuple()
-produced class as a base.
Don't turn this namedtuple into a list, but make use of the utility method provided for just this task:
parts = urlparse.urlparse(url)
parts = parts._replace(path='yah')
url = parts.geturl()
The namedtuple._replace()
method lets you create a new copy with specific elements replaced. The ParseResult.geturl()
method then re-joins the parts into a url for you.
Demo:
>>> import urlparse
>>> url = 'http://foo/bar'
>>> parts = urlparse.urlparse(url)
>>> parts = parts._replace(path='yah')
>>> parts.geturl()
'http://foo/yah'
mgilson filed a bug report (with patch) to address the documentation issue.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With