Which URL parsing function pair should I be using and why?
urlparse
and urlunparse
, orurlsplit
and urlunsplit
?parse — Parse URLs into components. This module defines a standard interface to break Uniform Resource Locator (URL) strings up in components (addressing scheme, network location, path etc.), to combine the components back into a URL string, and to convert a “relative URL” to an absolute URL given a “base URL.”
The return value from the urlparse() function is an object which acts like a tuple with 6 elements. The parts of the URL available through the tuple interface are the scheme, network location, path, parameters, query, and fragment.
urlparse() This function parses a URL into six components, returning a 6-tuple. This corresponds to the general structure of a URL. Each tuple item is a string. The components are not broken up in smaller parts (for example, the network location is a single string), and % escapes are not expanded.
The urlparse module contains functions to process URLs, and to convert between URLs and platform-specific filenames.
Directly from the docs you linked yourself:
urllib.parse.urlsplit(urlstring, scheme='', allow_fragments=True)
This is similar tourlparse()
, but does not split the params from the URL. This should generally be used instead ofurlparse()
if the more recent URL syntax allowing parameters to be applied to each segment of the path portion of the URL (see RFC 2396) is wanted.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With