How can I get the base of a URL in Python?

Tags:

I'm trying to determine the base of a URL, or everything besides the page and parameters. I tried using split, but is there a better way than splitting it up into pieces? Is there a way I can remove everything from the last '/'?

Given this: http://127.0.0.1/asdf/login.php

I would like: http://127.0.0.1/asdf/

950

asked Feb 25 '16 01:02

Brendan

2 Answers

The best way to do this is use urllib.parse.

From the docs:

The module has been designed to match the Internet RFC on Relative Uniform Resource Locators. It supports the following URL schemes: file, ftp, gopher, hdl, http, https, imap, mailto, mms, news, nntp, prospero, rsync, rtsp, rtspu, sftp, shttp, sip, sips, snews, svn, svn+ssh, telnet, wais, ws, wss.

You'd want to do something like this using urlsplit and urlunsplit:

from urllib.parse import urlsplit, urlunsplit  split_url = urlsplit('http://127.0.0.1/asdf/login.php?q=abc#stackoverflow')  # You now have: # split_url.scheme   "http" # split_url.netloc   "127.0.0.1"  # split_url.path     "/asdf/login.php" # split_url.query    "q=abc" # split_url.fragment "stackoverflow"  # Use all the path except everything after the last '/'  clean_path = "".join(split_url.path.rpartition("/")[:-1])  # "/asdf/"  # urlunsplit joins a urlsplit tuple clean_url = urlunsplit(split_url)  # "http://127.0.0.1/asdf/login.php?q=abc#stackoverflow"   # A more advanced example  advanced_split_url = urlsplit('http://foo:[email protected]:5000/asdf/login.php?q=abc#stackoverflow')  # You now have *in addition* to the above: # advanced_split_url.username   "foo" # advanced_split_url.password   "bar" # advanced_split_url.hostname   "127.0.0.1" # advanced_split_url.port       "5000"

146

answered Sep 21 '22 19:09

dalanmiller

Well, for one, you could just use os.path.dirname:

>>> os.path.dirname('http://127.0.0.1/asdf/login.php') 'http://127.0.0.1/asdf'

It's not explicitly for URLs, but it happens to work on them (even on Windows), it just doesn't leave the trailing slash (you can just add it back yourself).

You may also want to look at urllib.parse.urlparse for more fine-grained parsing; if the URL has a query string or hash involved, you'd want to parse it into pieces, trim the path component returned by parsing, then recombine, so the path is trimmed without losing query and hash info.

Lastly, if you want to just split off the component after the last slash, you can do an rsplit with a maxsplit of 1, and keep the first component:

>>> 'http://127.0.0.1/asdf/login.php'.rsplit('/', 1)[0] 'http://127.0.0.1/asdf'

answered Sep 20 '22 19:09

ShadowRanger

Related questions
                            
                                How to store the output of a command in a variable at the same time as printing the output?
                            
                                post application/x-www-form-urlencoded Alamofire
                            
                                .NET Core use Configuration to bind to Options with Array
                            
                                Python encoded message with HMAC-SHA256
                            
                                ImportError: cannot import name patterns
                            
                                Have a disabled onClick?
                            
                                Failed to resolve: com.android.support:appcompat-v7 24.0.1
                            
                                How to increase padding or margin between menu item icon and title in app toolbar?
                            
                                Firebase for Android, How can I loop through a child (for each child = x do y)
                            
                                Angular 1.x with TypeScript 2.x, @types, and SystemJS - Using global typings
                            
                                String operation on env variables on Kubernetes
                            
                                How to use forEach in vueJs?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With