Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Compare 2 Folders and Find Files with Differing Byte Counts

Using Gnome in Linux Mint 12, I copied a Folder of about 9.7 GB (containing a complex tree of subfolders) from one NTFS Flash Drive to another NTFS Flash Drive. According to Gnome the file counts match, but according to du (and other programs) the byte counts don't match. (I've had the same problem copying folders in other Linux distros and Windows XP.)

I only want to know which files don't have matching byte counts. (I don't want to compare the contents of each file, because that would take way too long.) What's the best, easiest and fastest way to find the byte-count-mismatched files?

like image 325
user1464189 Avatar asked Jun 18 '12 16:06

user1464189


People also ask

How do I compare two folders with WinMerge?

With the two folders selected, right-click and choose Compare (or click Merge → Compare in the menu).

How do you compare two folders and find missing files?

To see if two folders have the same file, you have to compare them and see if there are any differences. To do this, you can use a file comparison tool such as WinMerge, open it, go to the File tab, choose the folders you want to compare, and hit Compare. How do I sync folders in Windows 10?

Can Notepad ++ compare folders?

By default Notepad++ doesn't have compare function. We can make it possible by easily installing a compare plugin after Notepad++ is installed.


1 Answers

I would adapt the answer by @user1464130 as it has trouble handling spaces in file names.

cd dir1
find . -type f -printf "%p %s\n" | sort > ~/dir1.txt
cd dir2
find . -type f -printf "%p %s\n" | sort > ~/dir2.txt
diff ~/dir1.txt ~/dir2.txt

If you want to launch a command on each file and use the result in the report, you can use the while Bash construct. This example uses md5sum to compute a checksum for each file.

find . -maxdepth 1 -type f -printf "%p %s\n" | while read path size; do echo "$path - $(md5sum $path | tr -s " " | cut -f 1 -d " ") - $size" ; done

Each $() is executed separately and allows us to compute the checksum for each file. The use of tr squeezes every consecutive spaces into a single space and cut extracts the word in the n-th position, here in the first position. If we don't do that, we get the name of the file two times because md5sum give it back on stdout.

Here is an example without using the comparison (no diff). Note that I've used a dash - to emphasize the three datas we output about each file but it could be a problem if you want to feed it to another program.

$ find . -maxdepth 1 -name "*.c" -type f -printf "%p %s\n" |  while read path size; do echo "$path - $(md5sum $path | tr -s " " | cut -f 1 -d " ") - $size" ; done
./thread.c - 5f2b7b12c7cd12fcb9e9796078e5d15b - 584
./utils.c - d61bc1dbc72768e622a04f03e3b8f7a2 - 3413

EDIT : And to handle spaces in filenames and still get the checksum and the size, you can use the following code.

$ find . -maxdepth 1 -name "*.c" -type f -print0 | xargs -0 -n 1 md5sum | while read checksum path; do echo $path $(stat --printf="%s" "$path") $checksum ; done
./ini tia li za tion.c 84 31626123e9056bac2e96b472bd62f309
like image 142
Ludovic Kuty Avatar answered Nov 05 '22 07:11

Ludovic Kuty