A subsystem which I have no control over insists on providing filesystem paths in the form of a uri. Is there a python module/function which can convert this path into the appropriate form expected by the filesystem in a platform independent manner?
The single slash between host and path denotes the start of the local-path part of the URI and must be present. A valid file URI must therefore begin with either file:/path (no hostname), file:///path (empty hostname), or file://hostname/path .
The path component of a file name is the file name itself. For a URI, it is the main hierarchical part of the URI, without schema, authority, query, or fragment.
You can use the os. mkdir("path/to/dir/here") function to create a directory in Python. This function is helpful if you need to create a new directory that doesn't already exist. However, as you have learned above, this function will only work across operating systems, if you construct the path with os.
path. dirname(path) : It is used to return the directory name from the path given. This function returns the name from the path except the path name.
Use urllib.parse.urlparse
to get the path from the URI:
import os from urllib.parse import urlparse p = urlparse('file://C:/test/doc.txt') final_path = os.path.abspath(os.path.join(p.netloc, p.path))
The solution from @Jakob Bowyer doesn't convert URL encoded characters to regular UTF-8 characters. For that you need to use urllib.parse.unquote
.
>>> from urllib.parse import unquote, urlparse >>> unquote(urlparse('file:///home/user/some%20file.txt').path) '/home/user/some file.txt'
Of all the answers so far, I found none that catch edge cases, doesn't require branching, are both 2/3 compatible, and cross-platform.
In short, this does the job, using only builtins:
try:
from urllib.parse import urlparse, unquote
from urllib.request import url2pathname
except ImportError:
# backwards compatability
from urlparse import urlparse
from urllib import unquote, url2pathname
def uri_to_path(uri):
parsed = urlparse(uri)
host = "{0}{0}{mnt}{0}".format(os.path.sep, mnt=parsed.netloc)
return os.path.normpath(
os.path.join(host, url2pathname(unquote(parsed.path)))
)
The tricky bit (I found) was when working in Windows with paths specifying a host. This is a non-issue outside of Windows: network locations in *NIX can only be reached via paths after being mounted to the root of the filesystem.
From Wikipedia:
A file URI takes the form of file://host/path
, where host is the fully qualified domain name of the system on which the path is accessible [...]. If host is omitted, it is taken to be "localhost".
With that in mind, I make it a rule to ALWAYS prefix the path with the netloc
provided by urlparse
, before passing it to os.path.abspath
, which is necessary as it removes any resulting redundant slashes (os.path.normpath
, which also claims to fix the slashes, can get a little over-zealous in Windows, hence the use of abspath
).
The other crucial component in the conversion is using unquote
to escape/decode the URL percent-encoding, which your filesystem won't otherwise understand. Again, this might be a bigger issue on Windows, which allows things like $
and spaces in paths, which will have been encoded in the file URI.
For a demo:
import os
from pathlib import Path # This demo requires pip install for Python < 3.4
import sys
try:
from urllib.parse import urlparse, unquote
from urllib.request import url2pathname
except ImportError: # backwards compatability:
from urlparse import urlparse
from urllib import unquote, url2pathname
DIVIDER = "-" * 30
if sys.platform == "win32": # WINDOWS
filepaths = [
r"C:\Python27\Scripts\pip.exe",
r"C:\yikes\paths with spaces.txt",
r"\\localhost\c$\WINDOWS\clock.avi",
r"\\networkstorage\homes\rdekleer",
]
else: # *NIX
filepaths = [
os.path.expanduser("~/.profile"),
"/usr/share/python3/py3versions.py",
]
for path in filepaths:
uri = Path(path).as_uri()
parsed = urlparse(uri)
host = "{0}{0}{mnt}{0}".format(os.path.sep, mnt=parsed.netloc)
normpath = os.path.normpath(
os.path.join(host, url2pathname(unquote(parsed.path)))
)
absolutized = os.path.abspath(
os.path.join(host, url2pathname(unquote(parsed.path)))
)
result = ("{DIVIDER}"
"\norig path: \t{path}"
"\nconverted to URI:\t{uri}"
"\nrebuilt normpath:\t{normpath}"
"\nrebuilt abspath:\t{absolutized}").format(**locals())
print(result)
assert path == absolutized
Results (WINDOWS):
------------------------------
orig path: C:\Python27\Scripts\pip.exe
converted to URI: file:///C:/Python27/Scripts/pip.exe
rebuilt normpath: C:\Python27\Scripts\pip.exe
rebuilt abspath: C:\Python27\Scripts\pip.exe
------------------------------
orig path: C:\yikes\paths with spaces.txt
converted to URI: file:///C:/yikes/paths%20with%20spaces.txt
rebuilt normpath: C:\yikes\paths with spaces.txt
rebuilt abspath: C:\yikes\paths with spaces.txt
------------------------------
orig path: \\localhost\c$\WINDOWS\clock.avi
converted to URI: file://localhost/c%24/WINDOWS/clock.avi
rebuilt normpath: \localhost\c$\WINDOWS\clock.avi
rebuilt abspath: \\localhost\c$\WINDOWS\clock.avi
------------------------------
orig path: \\networkstorage\homes\rdekleer
converted to URI: file://networkstorage/homes/rdekleer
rebuilt normpath: \networkstorage\homes\rdekleer
rebuilt abspath: \\networkstorage\homes\rdekleer
Results (*NIX):
------------------------------
orig path: /home/rdekleer/.profile
converted to URI: file:///home/rdekleer/.profile
rebuilt normpath: /home/rdekleer/.profile
rebuilt abspath: /home/rdekleer/.profile
------------------------------
orig path: /usr/share/python3/py3versions.py
converted to URI: file:///usr/share/python3/py3versions.py
rebuilt normpath: /usr/share/python3/py3versions.py
rebuilt abspath: /usr/share/python3/py3versions.py
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With