I'd like to know how to compare two directories (not recursively) only by filename (ignore extension) to get the difference. For example, if I have list A and B, I want to know what is present in A and not in B.
I am currently processing some images. In one directory I have source files with the extension .tiff and in the other directory I have processed files with the extension .png. The filenames are the same in both directories, but only the extension differs (ex. one file is named foo.tiff in directory A, and it is named foo.png in directory B).
I'm trying to find which files have not yet been processed.
Thanks!
Hope this helps.
-q Report only whether the files differ, not the details of the differences.
-r When comparing directories, recursively compare any subdirectories found.
diff -qr /dir1 /dir2
First let's create a helper function:
getfiles() { find "$1" -maxdepth 1 -type f -exec bash -c 'for f in "$@"; do basename "${f%.*}"; done' "" {} + | sort; }
If you run getfiles dirname
, it will return a sorted list of files in that directory without the directory's name and without any extension. The -maxdepth 1
option means that find
will not search recursively.
Now, let's compare the files directories A
and B
:
diff <(getfiles A) <(getfiles B)
The output is in the usual diff
format. As any of diff's normal options can be used, the output format is quite flexible.
Here is a sample directory A
and B
, each having one file that the other doesn't have:
$ ls */
A/:
bar.png foo.png qux.png
B/:
bar.tiff baz.tiff foo.tiff
The output:
$ diff <(getfiles A) <(getfiles B)
1a2
> baz
3d3
< qux
The output correctly identifies (a) that B
has a baz
file that is not present in A
and (b) that A
has a qux
file that is not present in B
.
Suppose that we just want to do a one-sided comparison and find what files in B
are not also in A
. In this case, grep
can be used:
$ grep -vxFf <(getfiles A) <(getfiles B)
baz
The options used here are:
-v
tells grep
to exclude matching lines
-x
tells grep
to match whole lines only
-F
tells grep
that the patterns are fixed strings, not regular expressions.
-f
tells grep
to get the list of patterns from file or, in this case, the file-like object <(getfiles A)
.Consider these files:
$ ls */
A A/:
1 bar.png 1 foo.png 1 qux.png
B B/:
1 bar.tiff 1 baz.tiff 1 foo.tiff
The output:
$ diff <(getfiles 'A A') <(getfiles 'B B')
1a2
> 1 baz
3d3
< 1 qux
Or,
$ grep -vxFf <(getfiles 'A A') <(getfiles 'B B')
1 baz
If any of your file names have newline characters in them, this will give incorrect results. At least for the grep
form, this could be extended to the more general case.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With