Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python 3: Copying the most recent file in a large directory

So as the title presents, I'm attempting to determine and copy the most recent file in a large directory. Most solutions I have found either list the directory first or use glob.glob, and then use max(file, key=os.path.getmtime) to determine the latest file.

My problem with this is that the directory I am attempting to search has over 10,000 files, and listing all of those takes forever.

Is there a way that I can "call off" the listing, so to speak, once I've determined what the first(most recent) file is? Or perhaps another method I'm unaware of?

like image 865
readonlyexe Avatar asked Nov 07 '22 10:11

readonlyexe


1 Answers

You can use os.walk to iterate over the directory and apply max on the generator. There are many nuanced depending on your usage case. For example, do you want to walk shallowly or recursively into sub-directories? As a proof of concept, you can trying something like this but possibly modify it to suit your need.

import os
import os.path


def mtime_gen(root, *args, **kwargs):
    for dirpath, dirnames, filenames in os.walk(root, *args, **kwargs):
        # NOTE:
        # Here, if you want to skip the depth-walk into sub-directories,
        # you can ignore the `dirnames`
        for basename in filenames:
            path = os.path.join(dirpath, basename)
            # Further heuristics, if any, may help you skipping impossible
            # candidates of the most recent file with the `continue` statement
            # so that expensive `stat` calls can be omitted.
            yield os.stat(path).st_mtime, path

recent_timestamp, recent_path = max(mtime_gen("/path/to/root"))
do_something_with(recent_path)    # For example, copying it.

This could be somewhat faster than glob because walk doesn't do pattern matching. Compared with listdir it doesn't populate the list with subdirectories, if that's a concern.

The bottleneck is likely the slow system call stat, so some heuristics may help you skip impossible paths and not stating them, if you already know something about likely outcomes.

Notice that this is just a proof of concept. As is with systems programming in general, you must deal with complications and exceptions carefully. This is a highly non-trivial task.

like image 180
Cong Ma Avatar answered Nov 14 '22 22:11

Cong Ma