List files that are in directory1 but NOT in directory2 and vice versa?

Question

Hey, I started bash shell scripting and I'm trying to make a script for an assignment that when you enter two directories, it will check if they exist and display according error message and if both directories DO exist, it will list the differences between the current directories.

$ cd dir-1
$ myshellscript . dir-2 (comparing . aka dir-1 against dir-2)

Output:

Files that are in . but not in dir-2
-rw------- 1 ddddd users   1 2011-03-1 01:26 123123123

Files that are in dir-2 but not in .
-rw------- 1 ddddd users   1 2011-03-1 01:26 zzzzzzzzzzzz

What I have so far that does not seem to detect whether a directory exists nor list differences:

dir-1=$1
dir-2=$2

if [ $# > 2  ]
   then
      echo "Usage: compdir dir-name1 dir-name 2"
      exit 1
   elif [ $# < 2 ]
      then
         echo "Usage: comdir dir-name1 dir-name 2"
   elif [ ! -d "$@" ]
      then
         echo "/$@ is not a valid existing directory"
   else
      exit 0
fi

echo $dir-1
echo $dir-2

List of commands I have to work with, otherwise I would have used comm -32 <(ls -la dir-1) <(ls -la dir-2)

http://dl.dropbox.com/u/20930447/index.html

SiegeX · Accepted Answer

awk '{a[$0]++}END{print "some message"; for(i in a)if(a[i]<2){print i}}' <(ls -1 dir2) <(ls -1 dir1)

Proof of Concept

$ ls -1 dir1
file1.txt
file2.txt
file3.txt
file4.txt
file5.txt

$ ls -1 dir2
file1.txt
file3.txt
file4.txt

$ awk '{a[$0]++}END{print "Files in dir1 but NOT in dir2"; for(i in a)if(a[i]<2){print i}}' <(ls -1 dir2) <(ls -1 dir1)
Files in dir1 but NOT in dir2
file5.txt
file2.txt

nhed · Answer

a bit crude - but the easiest way I always use is (can play with the diff params, I typically use different grep

diff -rcw DIR1 DIR2| grep ^Only

then you can sort and format as you like

Revised to format (less efficient as we are running diff twice here ... easily solved)

echo files only in $dir1
LST=$(diff ${dir1} ${dir2}| grep "^Only in ${dir1}"| sed 's@^.*: @@')
(cd ${dir1}; ls -l ${LST})

echo files only in $dir2
LST=$(diff ${dir1} ${dir2}| grep "^Only in ${dir2}"| sed 's@^.*: @@')
(cd ${dir2}; ls -l ${LST})

Expanding on the sed expression above:
s=search and replace
the three '@' are separating the expressions (this is TRADITIONALLY done with '/')
^ matches the beginning of a line (forces the rest not to match elsewhere) . means any character
* means the previous expression (.==match any char) 0-N times ": " is what I matched on from the diff output "Only in X: "

Look Mommy, no hands - now without 'sed' its beginning to be less and less crude

XIFS="${IFS}"
IFS=$'

'
for DIFFLINE in $(diff ${dir1} ${dir2}|grep ^Only); do
  case "${DIFFLINE}" in
   "Only in ${dir1}"*)  
    LST1="${LST1} ${DIFFLINE#*:}"
    ;;
   "Only in ${dir2}"*)  
    LST2+="${DIFFLINE#*:}"
    ;;
  esac
done
IFS="${XIFS}"

echo files only in $dir1
(cd ${dir1}; ls -l ${LST1})

echo files only in $dir2
(cd ${dir2}; ls -l ${LST2})

You will probably want to know about IFS ... it needs some reading in the bash manual, but its basically the field separator characters ... by default they include spaces and I don't want the loop to be fed with fractions of lines, just complete lines - so for the duration of the loop I override the default IFS to just newlines and carriage returns.

BTW maybe your professor is reading stackoverflow, maybe next you wont be allowed to use semicolons ;-) ... (back to 'man bash' ... BTW if you do 'man bash' do it in emacs, makes much easier to read IMO)

Brian Carlton · Answer

This almost works. It mainly fails where there are files that are similar locations alphabetically between the two dirs.

sdiff -s <(ls -1 dir1) <(ls -1 dir2)

Ezra · Answer

The basic recipe of what you want to do, is already done using the diff utility available on unix-like systems, or using cygwin or GnuWin on Windows. You should exploit this fact.

If I have directory a and b with the following contents:

ezra@ubuntu:~$ ls -R
.:
a  b

./a:
d  e  f  x  y  z

./b:
i  j  k  x  y  z

The x, y, and z are exactly the same in each directory.

I can achieve what you want using the diff command like this:

ezra@ubuntu:~$ diff a b
Only in a: d
Only in a: e
Only in a: f
Only in b: i
Only in b: j
Only in b: k

If I add a new file to each directory (named new), which are different, I get the following:

ezra@ubuntu:~$ diff a b
Only in a: d
Only in a: e
Only in a: f
Only in b: i
Only in b: j
Only in b: k
diff a/new b/new
1c1
< ezraa
---
> ezra

That is, it'll even tell you how, and where the differences in the files occur. Of course, if you don't want or need this functionality, you're free to not use it.

You also get the following:

ezra@ubuntu:~$ diff a c
diff: c: No such file or directory

With the heavy-lifting of this program done by diff, most of what you write will be parsing the output of this command, and then manipulating or outputting it as you see fit.

One of awk or sed might be of particular interest when you're doing this.

List files that are in directory1 but NOT in directory2 and vice versa?

Tags:

linux

bash

shell

eveo

4 Answers

Proof of Concept

SiegeX

nhed

Brian Carlton

Ezra

Recent Activity

Donate For Us

List files that are in directory1 but NOT in directory2 and vice versa?

Tags:

linux

bash

shell

eveo

4 Answers

Proof of Concept

SiegeX

nhed

Brian Carlton

Ezra

Related questions

Recent Activity

Donate For Us