Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a convenient way to map a file uri to os.path?

A subsystem which I have no control over insists on providing filesystem paths in the form of a uri. Is there a python module/function which can convert this path into the appropriate form expected by the filesystem in a platform independent manner?

like image 547
Gearoid Murphy Avatar asked May 12 '11 11:05

Gearoid Murphy


People also ask

Are file paths URI?

The single slash between host and path denotes the start of the local-path part of the URI and must be present. A valid file URI must therefore begin with either file:/path (no hostname), file:///path (empty hostname), or file://hostname/path .

What is the path in the URI?

The path component of a file name is the file name itself. For a URI, it is the main hierarchical part of the URI, without schema, authority, query, or fragment.

How do you create an OS path in Python?

You can use the os. mkdir("path/to/dir/here") function to create a directory in Python. This function is helpful if you need to create a new directory that doesn't already exist. However, as you have learned above, this function will only work across operating systems, if you construct the path with os.

What does OS path return?

path. dirname(path) : It is used to return the directory name from the path given. This function returns the name from the path except the path name.


3 Answers

Use urllib.parse.urlparse to get the path from the URI:

import os from urllib.parse import urlparse p = urlparse('file://C:/test/doc.txt') final_path = os.path.abspath(os.path.join(p.netloc, p.path)) 
like image 193
Jakob Bowyer Avatar answered Sep 21 '22 18:09

Jakob Bowyer


The solution from @Jakob Bowyer doesn't convert URL encoded characters to regular UTF-8 characters. For that you need to use urllib.parse.unquote.

>>> from urllib.parse import unquote, urlparse >>> unquote(urlparse('file:///home/user/some%20file.txt').path) '/home/user/some file.txt' 
like image 35
colton7909 Avatar answered Sep 21 '22 18:09

colton7909


Of all the answers so far, I found none that catch edge cases, doesn't require branching, are both 2/3 compatible, and cross-platform.

In short, this does the job, using only builtins:

try:
    from urllib.parse import urlparse, unquote
    from urllib.request import url2pathname
except ImportError:
    # backwards compatability
    from urlparse import urlparse
    from urllib import unquote, url2pathname


def uri_to_path(uri):
    parsed = urlparse(uri)
    host = "{0}{0}{mnt}{0}".format(os.path.sep, mnt=parsed.netloc)
    return os.path.normpath(
        os.path.join(host, url2pathname(unquote(parsed.path)))
    )

The tricky bit (I found) was when working in Windows with paths specifying a host. This is a non-issue outside of Windows: network locations in *NIX can only be reached via paths after being mounted to the root of the filesystem.

From Wikipedia: A file URI takes the form of file://host/path , where host is the fully qualified domain name of the system on which the path is accessible [...]. If host is omitted, it is taken to be "localhost".

With that in mind, I make it a rule to ALWAYS prefix the path with the netloc provided by urlparse, before passing it to os.path.abspath, which is necessary as it removes any resulting redundant slashes (os.path.normpath, which also claims to fix the slashes, can get a little over-zealous in Windows, hence the use of abspath).

The other crucial component in the conversion is using unquote to escape/decode the URL percent-encoding, which your filesystem won't otherwise understand. Again, this might be a bigger issue on Windows, which allows things like $ and spaces in paths, which will have been encoded in the file URI.

For a demo:

import os
from pathlib import Path   # This demo requires pip install for Python < 3.4
import sys
try:
    from urllib.parse import urlparse, unquote
    from urllib.request import url2pathname
except ImportError:  # backwards compatability:
    from urlparse import urlparse
    from urllib import unquote, url2pathname

DIVIDER = "-" * 30

if sys.platform == "win32":  # WINDOWS
    filepaths = [
        r"C:\Python27\Scripts\pip.exe",
        r"C:\yikes\paths with spaces.txt",
        r"\\localhost\c$\WINDOWS\clock.avi",
        r"\\networkstorage\homes\rdekleer",
    ]
else:  # *NIX
    filepaths = [
        os.path.expanduser("~/.profile"),
        "/usr/share/python3/py3versions.py",
    ]

for path in filepaths:
    uri = Path(path).as_uri()
    parsed = urlparse(uri)
    host = "{0}{0}{mnt}{0}".format(os.path.sep, mnt=parsed.netloc)
    normpath = os.path.normpath(
        os.path.join(host, url2pathname(unquote(parsed.path)))
    )
    absolutized = os.path.abspath(
        os.path.join(host, url2pathname(unquote(parsed.path)))
    )
    result = ("{DIVIDER}"
              "\norig path:       \t{path}"
              "\nconverted to URI:\t{uri}"
              "\nrebuilt normpath:\t{normpath}"
              "\nrebuilt abspath:\t{absolutized}").format(**locals())
    print(result)
    assert path == absolutized

Results (WINDOWS):

------------------------------
orig path:              C:\Python27\Scripts\pip.exe
converted to URI:       file:///C:/Python27/Scripts/pip.exe
rebuilt normpath:       C:\Python27\Scripts\pip.exe
rebuilt abspath:        C:\Python27\Scripts\pip.exe
------------------------------
orig path:              C:\yikes\paths with spaces.txt
converted to URI:       file:///C:/yikes/paths%20with%20spaces.txt
rebuilt normpath:       C:\yikes\paths with spaces.txt
rebuilt abspath:        C:\yikes\paths with spaces.txt
------------------------------
orig path:              \\localhost\c$\WINDOWS\clock.avi
converted to URI:       file://localhost/c%24/WINDOWS/clock.avi
rebuilt normpath:       \localhost\c$\WINDOWS\clock.avi
rebuilt abspath:        \\localhost\c$\WINDOWS\clock.avi
------------------------------
orig path:              \\networkstorage\homes\rdekleer
converted to URI:       file://networkstorage/homes/rdekleer
rebuilt normpath:       \networkstorage\homes\rdekleer
rebuilt abspath:        \\networkstorage\homes\rdekleer

Results (*NIX):

------------------------------
orig path:              /home/rdekleer/.profile
converted to URI:       file:///home/rdekleer/.profile
rebuilt normpath:       /home/rdekleer/.profile
rebuilt abspath:        /home/rdekleer/.profile
------------------------------
orig path:              /usr/share/python3/py3versions.py
converted to URI:       file:///usr/share/python3/py3versions.py
rebuilt normpath:       /usr/share/python3/py3versions.py
rebuilt abspath:        /usr/share/python3/py3versions.py
like image 42
Ryan de Kleer Avatar answered Sep 21 '22 18:09

Ryan de Kleer