walk() function to find our file. The findfile() function takes the file's name and the root path as input parameters and returns the path of our specified file. This approach gives us the absolute path of the file.
Python can search for file names in a specified path of the OS. This can be done using the module os with the walk() functions. This will take a specific path as input and generate a 3-tuple involving dirpath, dirnames, and filenames. In the below example we are searching for a file named smpl.
To retrieve a file in Python, you need to know the exact path to reach the file, in Windows, you can view a particular file's path by right-clicking the File-> Properties-> General-> Location. Similarly, to run a script, the working directory needs to be set to the directory containing the script.
The __file__ variable: __file__ is a variable that contains the path to the module that is currently being imported. Python creates a __file__ variable for itself when it is about to import a module.
os.walk is the answer, this will find the first match:
import os
def find(name, path):
for root, dirs, files in os.walk(path):
if name in files:
return os.path.join(root, name)
And this will find all matches:
def find_all(name, path):
result = []
for root, dirs, files in os.walk(path):
if name in files:
result.append(os.path.join(root, name))
return result
And this will match a pattern:
import os, fnmatch
def find(pattern, path):
result = []
for root, dirs, files in os.walk(path):
for name in files:
if fnmatch.fnmatch(name, pattern):
result.append(os.path.join(root, name))
return result
find('*.txt', '/path/to/dir')
In Python 3.4 or newer you can use pathlib to do recursive globbing:
>>> import pathlib
>>> sorted(pathlib.Path('.').glob('**/*.py'))
[PosixPath('build/lib/pathlib.py'),
PosixPath('docs/conf.py'),
PosixPath('pathlib.py'),
PosixPath('setup.py'),
PosixPath('test_pathlib.py')]
Reference: https://docs.python.org/3/library/pathlib.html#pathlib.Path.glob
In Python 3.5 or newer you can also do recursive globbing like this:
>>> import glob
>>> glob.glob('**/*.txt', recursive=True)
['2.txt', 'sub/3.txt']
Reference: https://docs.python.org/3/library/glob.html#glob.glob
I used a version of os.walk
and on a larger directory got times around 3.5 sec. I tried two random solutions with no great improvement, then just did:
paths = [line[2:] for line in subprocess.check_output("find . -iname '*.txt'", shell=True).splitlines()]
While it's POSIX-only, I got 0.25 sec.
From this, I believe it's entirely possible to optimise whole searching a lot in a platform-independent way, but this is where I stopped the research.
If you are using Python on Ubuntu and you only want it to work on Ubuntu a substantially faster way is the use the terminal's locate
program like this.
import subprocess
def find_files(file_name):
command = ['locate', file_name]
output = subprocess.Popen(command, stdout=subprocess.PIPE).communicate()[0]
output = output.decode()
search_results = output.split('\n')
return search_results
search_results
is a list
of the absolute file paths. This is 10,000's of times faster than the methods above and for one search I've done it was ~72,000 times faster.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With