I know about urllib
and urlparse
, but I want to make sure I wouldn't be reinventing the wheel.
My problem is that I am going to be fetching a bunch of urls from the same domain via the urllib
library. I basically want to be able to generate urls to use (as strings) with different paths and query params. I was hoping that something might have a syntax like:
url_builder = UrlBuilder("some.domain.com")
# should give me "http://some.domain.com/blah?foo=bar
url_i_need_to_hit = url_builder.withPath("blah").withParams("foo=bar") # maybe a ".build()" after this
Basically I want to be able to store defaults that get passed to urlparse.urlunsplit
instead of constantly clouding up the code by passing in the whole tuple every time.
Does something like this exist? Do people agree it's worth throwing together?
Practical Data Science using PythonThe requests module can help us build the URLS and manipulate the URL value dynamically. Any sub-directory of the URL can be fetched programmatically and then some part of it can be substituted with new values to build new URLs.
How do you encode a URL in Python? In Python 3+, You can URL encode any string using the quote() function provided by urllib. parse package. The quote() function by default uses UTF-8 encoding scheme.
To find the URLs in a given string we have used the findall() function from the regular expression module of Python. This return all non-overlapping matches of pattern in string, as a list of strings.
Are you proposing an extension to http://docs.python.org/library/urlparse.html#urlparse.urlunparse that would substitute into the 6-item tuple?
Are you talking about something like this?
def myUnparse( someTuple, scheme=None, netloc=None, path=None, etc. ):
parts = list( someTuple )
if scheme is not None: parts[0] = scheme
if netloc is not None: parts[1]= netloc
if path is not None: parts[2]= path
etc.
return urlunparse( parts )
Is that what you're proposing?
This?
class URLBuilder( object ):
def __init__( self, base ):
self.parts = list( urlparse(base) )
def __call__( self, scheme=None, netloc=None, path=None, etc. ):
if scheme is not None: self.parts[0] = scheme
if netloc is not None: self.parts[1]= netloc
if path is not None: self.parts[2]= path
etc.
return urlunparse( self.parts )
bldr= URLBuilder( someURL )
print bldr( scheme="ftp" )
Something like that?
You might want consider having a look at furl because it might be an answer to your needs.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With