Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to split a dos path into its components in Python

Tags:

python

People also ask

How do you split a directory in Python?

path. split() method in Python is used to Split the path name into a pair head and tail. Here, tail is the last path name component and head is everything leading up to that.

What is OS path split?

os.path. split (path) Split the pathname path into a pair, (head, tail) where tail is the last pathname component and head is everything leading up to that. The tail part will never contain a slash; if path ends in a slash, tail will be empty. If there is no slash in path, head will be empty.

How do you split a string into a list in Python?

The split() method splits a string into a list. You can specify the separator, default separator is any whitespace. Note: When maxsplit is specified, the list will contain the specified number of elements plus one.


I would do

import os
path = os.path.normpath(path)
path.split(os.sep)

First normalize the path string into a proper string for the OS. Then os.sep must be safe to use as a delimiter in string function split.


I've been bitten loads of times by people writing their own path fiddling functions and getting it wrong. Spaces, slashes, backslashes, colons -- the possibilities for confusion are not endless, but mistakes are easily made anyway. So I'm a stickler for the use of os.path, and recommend it on that basis.

(However, the path to virtue is not the one most easily taken, and many people when finding this are tempted to take a slippery path straight to damnation. They won't realise until one day everything falls to pieces, and they -- or, more likely, somebody else -- has to work out why everything has gone wrong, and it turns out somebody made a filename that mixes slashes and backslashes -- and some person suggests that the answer is "not to do that". Don't be any of these people. Except for the one who mixed up slashes and backslashes -- you could be them if you like.)

You can get the drive and path+file like this:

drive, path_and_file = os.path.splitdrive(path)

Get the path and the file:

path, file = os.path.split(path_and_file)

Getting the individual folder names is not especially convenient, but it is the sort of honest middling discomfort that heightens the pleasure of later finding something that actually works well:

folders = []
while 1:
    path, folder = os.path.split(path)

    if folder != "":
        folders.append(folder)
    elif path != "":
        folders.append(path)

        break

folders.reverse()

(This pops a "\" at the start of folders if the path was originally absolute. You could lose a bit of code if you didn't want that.)


In Python >=3.4 this has become much simpler. You can now use pathlib.Path.parts to get all the parts of a path.

Example:

>>> from pathlib import Path
>>> Path('C:/path/to/file.txt').parts
('C:\\', 'path', 'to', 'file.txt')
>>> Path(r'C:\path\to\file.txt').parts
('C:\\', 'path', 'to', 'file.txt')

On a Windows install of Python 3 this will assume that you are working with Windows paths, and on *nix it will assume that you are working with posix paths. This is usually what you want, but if it isn't you can use the classes pathlib.PurePosixPath or pathlib.PureWindowsPath as needed:

>>> from pathlib import PurePosixPath, PureWindowsPath
>>> PurePosixPath('/path/to/file.txt').parts
('/', 'path', 'to', 'file.txt')
>>> PureWindowsPath(r'C:\path\to\file.txt').parts
('C:\\', 'path', 'to', 'file.txt')
>>> PureWindowsPath(r'\\host\share\path\to\file.txt').parts
('\\\\host\\share\\', 'path', 'to', 'file.txt')

Edit: There is also a backport to python 2 available: pathlib2


You can simply use the most Pythonic approach (IMHO):

import os

your_path = r"d:\stuff\morestuff\furtherdown\THEFILE.txt"
path_list = your_path.split(os.sep)
print path_list

Which will give you:

['d:', 'stuff', 'morestuff', 'furtherdown', 'THEFILE.txt']

The clue here is to use os.sep instead of '\\' or '/', as this makes it system independent.

To remove colon from the drive letter (although I don't see any reason why you would want to do that), you can write:

path_list[0] = path_list[0][0]

For a somewhat more concise solution, consider the following:

def split_path(p):
    a,b = os.path.split(p)
    return (split_path(a) if len(a) and len(b) else []) + [b]

The problem here starts with how you're creating the string in the first place.

a = "d:\stuff\morestuff\furtherdown\THEFILE.txt"

Done this way, Python is trying to special case these: \s, \m, \f, and \T. In your case, \f is being treated as a formfeed (0x0C) while the other backslashes are handled correctly. What you need to do is one of these:

b = "d:\\stuff\\morestuff\\furtherdown\\THEFILE.txt"      # doubled backslashes
c = r"d:\stuff\morestuff\furtherdown\THEFILE.txt"         # raw string, no doubling necessary

Then once you split either of these, you'll get the result you want.