Compare 2 Folders and Find Files with Differing Byte Counts

Tags:

Using Gnome in Linux Mint 12, I copied a Folder of about 9.7 GB (containing a complex tree of subfolders) from one NTFS Flash Drive to another NTFS Flash Drive. According to Gnome the file counts match, but according to du (and other programs) the byte counts don't match. (I've had the same problem copying folders in other Linux distros and Windows XP.)

I only want to know which files don't have matching byte counts. (I don't want to compare the contents of each file, because that would take way too long.) What's the best, easiest and fastest way to find the byte-count-mismatched files?

325

asked Jun 18 '12 16:06

user1464189

1 Answers

I would adapt the answer by @user1464130 as it has trouble handling spaces in file names.

cd dir1
find . -type f -printf "%p %s\n" | sort > ~/dir1.txt
cd dir2
find . -type f -printf "%p %s\n" | sort > ~/dir2.txt
diff ~/dir1.txt ~/dir2.txt

If you want to launch a command on each file and use the result in the report, you can use the while Bash construct. This example uses md5sum to compute a checksum for each file.

find . -maxdepth 1 -type f -printf "%p %s\n" | while read path size; do echo "$path - $(md5sum $path | tr -s " " | cut -f 1 -d " ") - $size" ; done

Each $() is executed separately and allows us to compute the checksum for each file. The use of tr squeezes every consecutive spaces into a single space and cut extracts the word in the n-th position, here in the first position. If we don't do that, we get the name of the file two times because md5sum give it back on stdout.

Here is an example without using the comparison (no diff). Note that I've used a dash - to emphasize the three datas we output about each file but it could be a problem if you want to feed it to another program.

$ find . -maxdepth 1 -name "*.c" -type f -printf "%p %s\n" |  while read path size; do echo "$path - $(md5sum $path | tr -s " " | cut -f 1 -d " ") - $size" ; done
./thread.c - 5f2b7b12c7cd12fcb9e9796078e5d15b - 584
./utils.c - d61bc1dbc72768e622a04f03e3b8f7a2 - 3413

EDIT : And to handle spaces in filenames and still get the checksum and the size, you can use the following code.

$ find . -maxdepth 1 -name "*.c" -type f -print0 | xargs -0 -n 1 md5sum | while read checksum path; do echo $path $(stat --printf="%s" "$path") $checksum ; done
./ini tia li za tion.c 84 31626123e9056bac2e96b472bd62f309

142

answered Nov 05 '22 07:11

Ludovic Kuty

Related questions
                            
                                How to detach a terminal pane to a new window?
                            
                                How to find Hadoop hdfs directory on my system?
                            
                                Scons: how to force rebuild?
                            
                                why multiple passes for building Linux From Scratch (LFS)?
                            
                                gdb debugger accessing files that are not found
                            
                                twisted-iocpsupport error when using pip on ubuntu / debian io.h missing
                            
                                What are coding conventions for using floating-point in Linux device drivers?
                            
                                How to view/change socket connection timeout on Linux?
                            
                                wget-like bittorrent client or library? [closed]
                            
                                which suits linux ? GNU make vs cmake vs codeblocks vs qmake
                            
                                where does top gets real-time data
                            
                                Installing RMagick Gem
                            
                                Version numbers in shared object files
                            
                                Where does Android store shutdown logs?
                            
                                Signals when debugging
                            
                                How to monitor cwnd and ssthresh values for a TCP connection? [closed]
                            
                                Gearman , php extension problem : Class 'GearmanWorker' not found in .. using terminal but works on browser
                            
                                Why linux kernel use trap gate to handle divide_error exception?
                            
                                Return value of system() function call in C++, used to run a Python program
                            
                                install android sdk using command line linux

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Compare 2 Folders and Find Files with Differing Byte Counts

Tags:

file

linux

directory

compare

size

user1464189

People also ask

1 Answers

Ludovic Kuty

Recent Activity

Donate For Us