Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In linux, how to compare two directories by filename only and get list of results that did not match

I'd like to know how to compare two directories (not recursively) only by filename (ignore extension) to get the difference. For example, if I have list A and B, I want to know what is present in A and not in B.

I am currently processing some images. In one directory I have source files with the extension .tiff and in the other directory I have processed files with the extension .png. The filenames are the same in both directories, but only the extension differs (ex. one file is named foo.tiff in directory A, and it is named foo.png in directory B).

I'm trying to find which files have not yet been processed.

Thanks!

like image 752
apenngrace Avatar asked Dec 26 '22 02:12

apenngrace


2 Answers

Hope this helps.

-q Report only whether the files differ, not the details of the differences.
-r When comparing directories, recursively compare any subdirectories found.

diff -qr /dir1 /dir2
like image 44
Harish Prasanna Avatar answered Jan 29 '23 08:01

Harish Prasanna


First let's create a helper function:

getfiles() { find "$1" -maxdepth 1 -type f -exec bash -c 'for f in "$@"; do basename "${f%.*}"; done' "" {} + | sort; }

If you run getfiles dirname, it will return a sorted list of files in that directory without the directory's name and without any extension. The -maxdepth 1 option means that find will not search recursively.

Now, let's compare the files directories A and B:

diff <(getfiles A) <(getfiles B)

The output is in the usual diff format. As any of diff's normal options can be used, the output format is quite flexible.

Example

Here is a sample directory A and B, each having one file that the other doesn't have:

$ ls */
A/:
bar.png  foo.png  qux.png

B/:
bar.tiff  baz.tiff  foo.tiff

The output:

$ diff <(getfiles A) <(getfiles B)
1a2
> baz
3d3
< qux

The output correctly identifies (a) that B has a baz file that is not present in A and (b) that A has a qux file that is not present in B.

Alternative Output

Suppose that we just want to do a one-sided comparison and find what files in B are not also in A. In this case, grep can be used:

$ grep -vxFf <(getfiles A) <(getfiles B)
baz

The options used here are:

  • -v tells grep to exclude matching lines

  • -x tells grep to match whole lines only

  • -F tells grep that the patterns are fixed strings, not regular expressions.

  • -f tells grep to get the list of patterns from file or, in this case, the file-like object <(getfiles A).

Example With File and Directory Names That Include Spaces

Consider these files:

$ ls */
A A/:
1 bar.png  1 foo.png  1 qux.png

B B/:
1 bar.tiff  1 baz.tiff  1 foo.tiff

The output:

$ diff <(getfiles 'A A') <(getfiles 'B B')
1a2
> 1 baz
3d3
< 1 qux

Or,

$ grep -vxFf <(getfiles 'A A') <(getfiles 'B B')
1 baz

Limitation

If any of your file names have newline characters in them, this will give incorrect results. At least for the grep form, this could be extended to the more general case.

like image 163
John1024 Avatar answered Jan 29 '23 08:01

John1024