I have written code to find out common files between two given folders (paths) accounting for all levels of sub folders if present.
Please suggest if there is a more efficient method. It's taking too long if folders with many levels of subfolders are given.
def findCommonDeep(self,path1,path2):
commonfiles = []
for (dirpath1, dirname1, filenames1) in os.walk(path1):
for file in filenames1:
for (dirpath2, dirname2, filenames2) in os.walk(path2):
if (file in filenames2 and isfile(join(dirpath2, file))):
commonfiles.append(file)
print(commonfiles)
and calling this function with paths, like shown below:
findCommonDeep("/home/naseer/Python", "/home/naseer/C")
I understand that if I store a list of all files for any given path the speed of execution can be reduced. but I guess that will run out of memory. Please guide me in approaching to this more efficiently.
You can use generator expression to transform the output of os.walk
into two sets and use set intersection to efficiently identify the common paths.
import os
def findCommonDeep(path1, path2):
files1 = set(os.path.relpath(os.path.join(root, file), path1) for root, _, files in os.walk(path1) for file in files)
files2 = set(os.path.relpath(os.path.join(root, file), path2) for root, _, files in os.walk(path2) for file in files)
return files1 & files2
To reduce code duplication in the code above, you can use another list comprehension:
import os
def findCommonDeep(path1, path2):
return set.intersection(*(set(os.path.relpath(os.path.join(root, file), path) for root, _, files in os.walk(path) for file in files) for path in (path1, path2)))
And if you're looking for just the common file names rather than common path names, you can make the generator expression output just the file names instead:
def findCommonDeep(path1, path2):
return set.intersection(*(set(file for _, _, files in os.walk(path) for file in files) for path in (path1, path2)))
This is more efficient because it takes advantage of Python's set intersection operation, which has an average time complexity of O(min(len(n), len(m))
, whereas your code with 2 nested loops always takes O(n^2)
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With