Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Clean way to get the "true" stem of a Path object?

Expected inputs and outputs:

a                 -> a
a.txt             -> a
archive.tar.gz    -> archive
directory/file    -> file
d.x.y.z/f.a.b.c   -> f
logs/date.log.txt -> date # Mine!

Here's my implementation that feels dirty to me:

>>> from pathlib import Path
>>> example_path = Path("August 08 2015, 01'37'30.log.txt")
>>> example_path.stem
"August 08 2015, 01'37'30.log"
>>> example_path.suffixes
['.log', '.txt']
>>> suffixes_length = sum(map(len, example_path.suffixes))
>>> true_stem = example_path.name[:-suffixes_length]
>>> true_stem
"August 08 2015, 01'37'30"

Because it breaks on Paths without suffixes:

>>> ns_path = Path("no_suffix")
>>> sl = sum(map(len, ns_path.suffixes))
>>> ns_path.name[:-sl]
''

So I need to check if the Path has a suffix first:

>>> def get_true_stem(path: Path):
...     if path.suffix:
...         sl = sum(map(len, path.suffixes))
...         return path.name[:-sl]
...     else:
...         return path.stem
...
>>>
>>> get_true_stem(example_path)
"August 08, 2015, 01'37'30"
>>> get_true_stem(ns_path)
"no_suffix"

And this is my current use case:

>>> file_date = datetime.strptime(true_stem, "%B %d %Y, %H'%M'%S")
>>> file_date
datetime.datetime(2015, 8, 8, 1, 37, 30)
>>> new_dest = format(file_date, "%Y-%m-%dT%H:%M:%S%z") + ".log" # ISO-8601
>>> shutil.move(str(example_path), new_dest)

Thanks.

like image 377
Navith Avatar asked Aug 08 '15 06:08

Navith


People also ask

What does Pathlib resolve do?

Use resolve() This makes your path absolute and replaces all relative parts with absolute parts, and all symbolic links with physical paths. On case-insensitive file systems, it will also canonicalize the case ( file. TXT becomes file. txt ).

Why You Should Use Pathlib?

Pathlib allows you to easily iterate over that directory's content and also get files and folders that match a specific pattern. Remember the glob module that you used to import along with the os module to get paths that match a pattern?

What is path stem in Python?

The Python Path class has name , stem and suffix properties representing the filename, filename without extension and extension respectively. from pathlib import Path p = Path('a.html') name = p.name stem = p.stem suffix = p.suffix print(name) # a.html print(stem) # a print(suffix) # .html.


6 Answers

You could just .split it:

>>> Path('logs/date.log.txt').stem.split('.')[0]
'date'

os.path works just as well:

>>> os.path.basename('logs/date.log.txt').split('.')[0]
'date'

It passes all of the tests:

In [11]: all(Path(k).stem.split('.')[0] == v for k, v in {
   ....:     'a': 'a',
   ....:     'a.txt': 'a',
   ....:     'archive.tar.gz': 'archive',
   ....:     'directory/file': 'file',
   ....:     'd.x.y.z/f.a.b.c': 'f',
   ....:     'logs/date.log.txt': 'date'
   ....: }.items())
Out[11]: True
like image 111
Blender Avatar answered Oct 22 '22 05:10

Blender


How about a while loop method, where you keep taking .stem until the path has no suffixes remaining , Example -

from pathlib import Path
example_path = Path("August 08 2015, 01'37'30.log.txt")
example_path_stem = example_path.stem
while example_path.suffixes:
    example_path_stem = example_path.stem
    example_path = Path(example_path_stem)

Please note, the while loop exits the loop when example_path.suffixes returns an empty list (As empty list are False like in boolean context) .


Example/Demo -

>>> from pathlib import Path
>>> example_path = Path("August 08 2015, 01'37'30.log.txt")
>>> example_path_stem = example_path.stem
>>> while example_path.suffixes:
...     example_path_stem = example_path.stem
...     example_path = Path(example_path_stem)
...
>>> example_path_stem
"August 08 2015, 01'37'30"

For your second input - no_suffix -

>>> example_path = Path("no_suffix")
>>> example_path_stem = example_path.stem
>>> while example_path.suffixes:
...     example_path_stem = example_path.stem
...     example_path = Path(example_path_stem)
...
>>> example_path_stem
'no_suffix'
like image 34
Anand S Kumar Avatar answered Oct 22 '22 03:10

Anand S Kumar


Here's another possible solution to the given problem:

from pathlib import Path

if __name__ == '__main__':
    dataset = [
        ('a', 'a'),
        ('a.txt', 'a'),
        ('archive.tar.gz', 'archive'),
        ('directory/file', 'file'),
        ('d.x.y.z/f.a.b.c', 'f'),
        ('logs/date.log.txt', 'date'),
    ]
    for path, stem in dataset:
        path = Path(path)
        assert path.name.replace("".join(path.suffixes), "") == stem
like image 45
BPL Avatar answered Oct 22 '22 03:10

BPL


Why not go recursively?

from pathlib import Path

def true_stem(path):
   stem = Path(path).stem
   return stem if stem == path else true_stem(stem)

assert(true_stem('d.x.y.z/f.a.b.c') == 'f')
like image 20
plankthom Avatar answered Oct 22 '22 05:10

plankthom


Another approach uses pattern matching:

import re
from pathlib import Path
all(re.search('[.]|',Path(k).name) for k,v in {
   'a': 'a',
   'a.txt': 'a',
   'archive.tar.gz': 'archive',
   'directory/file': 'file',
   'd.x.y.z/f.a.b.c': 'f',
   'logs/date.log.txt': 'date'
   }.items())

the pattern '[.]' may be used if all your paths have at least one suffix

like image 24
Jim Robinson Avatar answered Oct 22 '22 03:10

Jim Robinson


If you wanted to use pathlib uniquely, you could also use:

>>> Path('logs/date.log.txt').with_suffix('').stem
'date'

EDIT:

As pointed out in the comments this doesn't work if you have an extension with more than 2 suffixes. Although this doesn't sound very likely (and pathlib itself doesn't have a native way to deal with it), if you wanted to use pathlib uniquely, you could use:

>>> Path('logs/date.log.txt.foo').with_suffix('').with_suffix('').stem
'date'
like image 23
universvm Avatar answered Oct 22 '22 04:10

universvm