Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Recursively compare two directories to ensure they have the same files and subdirectories

From what I observe filecmp.dircmp is recursive, but inadequate for my needs, at least in py2. I want to compare two directories and all their contained files. Does this exist, or do I need to build (using os.walk, for example). I prefer pre-built, where someone else has already done the unit-testing :)

The actual 'comparison' can be sloppy (ignore permissions, for example), if that helps.

I would like something boolean, and report_full_closure is a printed report. It also only goes down common subdirs. AFIAC, if they have anything in the left or right dir only those are different dirs. I build this using os.walk instead.

like image 786
Gregg Lind Avatar asked Nov 15 '10 18:11

Gregg Lind


People also ask

How do you compare the contents of two folders and synchronize them?

To compare and synchronize two folders, follow the steps below: Open Windows Explorer on your desktop PC, select the folder with the clip arts, right-click to open the context menu, and then select Compare and Sync. That's how you launch the Compare and Sync folders wizard.

How can I tell if two folders have the same file?

If you double-click on a folder, it will expand to reveal its contents. If you double-click on a file it will open a side by side comparison and will highlight the differences, if any, between the two files. Double-clicking a file will open both copies in a side by side view and will highlight any differences.


2 Answers

Here's an alternative implementation of the comparison function with filecmp module. It uses a recursion instead of os.walk, so it is a little simpler. However, it does not recurse simply by using common_dirs and subdirs attributes since in that case we would be implicitly using the default "shallow" implementation of files comparison, which is probably not what you want. In the implementation below, when comparing files with the same name, we're always comparing only their contents.

import filecmp import os.path  def are_dir_trees_equal(dir1, dir2):     """     Compare two directories recursively. Files in each directory are     assumed to be equal if their names and contents are equal.      @param dir1: First directory path     @param dir2: Second directory path      @return: True if the directory trees are the same and          there were no errors while accessing the directories or files,          False otherwise.    """      dirs_cmp = filecmp.dircmp(dir1, dir2)     if len(dirs_cmp.left_only)>0 or len(dirs_cmp.right_only)>0 or \         len(dirs_cmp.funny_files)>0:         return False     (_, mismatch, errors) =  filecmp.cmpfiles(         dir1, dir2, dirs_cmp.common_files, shallow=False)     if len(mismatch)>0 or len(errors)>0:         return False     for common_dir in dirs_cmp.common_dirs:         new_dir1 = os.path.join(dir1, common_dir)         new_dir2 = os.path.join(dir2, common_dir)         if not are_dir_trees_equal(new_dir1, new_dir2):             return False     return True 
like image 134
Mateusz Kobos Avatar answered Oct 14 '22 12:10

Mateusz Kobos


filecmp.dircmp is the way to go. But it does not compare the content of files found with the same path in two compared directories. Instead filecmp.dircmp only looks at files attributes. Since dircmp is a class, you fix that with a dircmp subclass and override its phase3 function that compares files to ensure content is compared instead of only comparing os.stat attributes.

import filecmp  class dircmp(filecmp.dircmp):     """     Compare the content of dir1 and dir2. In contrast with filecmp.dircmp, this     subclass compares the content of files with the same path.     """     def phase3(self):         """         Find out differences between common files.         Ensure we are using content comparison with shallow=False.         """         fcomp = filecmp.cmpfiles(self.left, self.right, self.common_files,                                  shallow=False)         self.same_files, self.diff_files, self.funny_files = fcomp 

Then you can use this to return a boolean:

import os.path  def is_same(dir1, dir2):     """     Compare two directory trees content.     Return False if they differ, True is they are the same.     """     compared = dircmp(dir1, dir2)     if (compared.left_only or compared.right_only or compared.diff_files          or compared.funny_files):         return False     for subdir in compared.common_dirs:         if not is_same(os.path.join(dir1, subdir), os.path.join(dir2, subdir)):             return False     return True 

In case you want to reuse this code snippet, it is hereby dedicated to the Public Domain or the Creative Commons CC0 at your choice (in addition to the default license CC-BY-SA provided by SO).

like image 22
Philippe Ombredanne Avatar answered Oct 14 '22 12:10

Philippe Ombredanne