Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

File Comparison via Byte Array issues

I am coding a class that compares the files of two directories via comparing the Byte arrays of each file. I am however not getting the expected results; identical files are not being resolved as identical files.

First problem:

Matching the files Byte[] with equals() resolves to false with matching files (Checked with only one file as to circumvent the possible index misalignment issue; the check still resolves to false.).

Second problem:

When using Vector's containsAll() for checking that both Vectors of Byte[] match (One Vector per directory with Byte[] for each file) this check results in false even with identical directories (This check has been removed from the code below.). So is there an issue with the way I am aligning the two vectors? (I have checked this with using two directories with matching files in the same order loaded into matching indeces; this still results in a Vector mismatch).

Third problem:

When there are subdirectories in the directories being checked a file not found exception is thrown stating that access is denied. Why is this happening? How can I circumvent this? I do not want to check the files contained within the subdirectories, but I am designing the code so that the end user need not worry about the subdirectories of the directories being compared. This only happens when there are subdirectories, it work fine when there are no subdirectories in the directories being checked.

Example Exception:

Byte reading error!
Byte reading error!
java.io.FileNotFoundException: C:\Dir1\Dir2\Dir3\Dir4\SubDir (Access is denied)
    at java.io.FileInputStream.open(Native Method)
    at java.io.FileInputStream.<init>(Unknown Source)
    at tools.filesystem.filecomparison.FileComparator.getBytes(FileComparator.java:166)
    at tools.filesystem.filecomparison.FileComparator.main(FileComparator.java:102)
java.io.FileNotFoundException: C:\Dir1\Dir2\Dir3\Dir4\SubDir Files (Access is denied)
    at java.io.FileInputStream.open(Native Method)
    at java.io.FileInputStream.<init>(Unknown Source)
    at tools.filesystem.filecomparison.FileComparator.getBytes(FileComparator.java:166)
    at tools.filesystem.filecomparison.FileComparator.main(FileComparator.java:111)

Here is the code:

package tools.filesystem.filecomparison;

import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.util.Scanner;
import java.util.Vector;

public class FileComparator
{
    public static void main(String[] args)
    {
        String workingDir1 = "";
        String workingDir2 = "";

        File[] fileArr1 = null;
        File[] fileArr2 = null;

        Vector<File> fileVec1 = new Vector<File>();
        Vector<File> fileVec2 = new Vector<File>();

        Scanner console = new Scanner(System.in);
        while (true)
        {
            System.out.println("Enter working directory one . . . .");
            workingDir1 = console.nextLine();
            workingDir1.replace("\\", "\\\\");

            System.out.println("Enter working directory two . . . .");
            workingDir2 = console.nextLine();
            workingDir2.replace("\\", "\\\\");

            File folder1 = new File(workingDir1);
            File[] listOfFiles1 = folder1.listFiles();

            File folder2 = new File(workingDir1);
            File[] listOfFiles2 = folder2.listFiles();

            fileArr1 = listOfFiles1;
            fileArr2 = listOfFiles2;

            System.out.println("\nWorking Directory 1 Files\n");
            for (int i = 0; i < listOfFiles1.length; i++)
            {
                if (listOfFiles1[i].isFile())
                {
                    System.out.println("    " + listOfFiles1[i].getName());
                } 
/*              else if (listOfFiles1[i].isDirectory())
                {
                    System.out.println("Directory " + listOfFiles1[i].getName());
                }*/
            }

            System.out.println("\nWorking Directory 2 Files\n");
            for (int i = 0; i < listOfFiles2.length; i++)
            {
                if (listOfFiles2[i].isFile())
                {
                    System.out.println("    " + listOfFiles2[i].getName());
                } 
/*              else if (listOfFiles2[i].isDirectory())
                {
                    System.out.println("Directory " + listOfFiles2[i].getName());
                }*/
            }

            for (File fle : fileArr1)
            {
                fileVec1.add(fle);
            }

            for (File fle : fileArr2)
            {
                fileVec2.add(fle);
            }

            if (fileVec1.containsAll(fileVec2))
                break;
            else
            {
                System.out.println("Directories do not contain the same files!\nContinue anyways? y/n?");
                if (console.nextLine().equalsIgnoreCase("y"))
                    break;
                else if (console.nextLine().equalsIgnoreCase("n"))
                    continue;   
            }
        }

        Vector<Vector<File>> alignedVectors = align(fileVec1, fileVec2);

        fileVec1 = alignedVectors.elementAt(0);
        fileVec2 = alignedVectors.elementAt(1);

        Vector<byte[]> fileByteVect1 =  new Vector<byte[]>();
        Vector<byte[]> fileByteVect2 =  new Vector<byte[]>();
        try
        {
            fileByteVect1 = getBytes(fileVec1);
        } 
        catch (IOException e)
        {
            System.out.println("Byte reading error!");
            e.printStackTrace();
        }
        try
        {
            fileByteVect2 = getBytes(fileVec2);
        } 
        catch (IOException e)
        {
            System.out.println("Byte reading error!");
            e.printStackTrace();
        }

        boolean[] check = new boolean[fileByteVect1.capacity()];

        int i1 = 0;
        //debug
        for (byte[] e : fileByteVect1)
        {
            System.out.println("Vector 1 count " + i1);
            System.out.println(e.toString());
            for (byte b : e)
            {
                System.out.print(b + " ");
            }
            i1++;
        }

        int i2 = 0;
        //debug
        for (byte[] e : fileByteVect2)
        {
            System.out.println("Vector 2 count " + i2);
            System.out.println(e.toString());
            for (byte b : e)
            {
                System.out.print(b + " ");
            }
            i2++;
        }

        if (fileByteVect1.size() == fileByteVect2.size())
        {
            System.out.println(fileByteVect1.size());
            for (int i = 0; i < fileByteVect1.size(); i++ )
            {
                if (fileByteVect1.elementAt(i).equals(fileByteVect2.elementAt(i)))
                {
                    check[i] = true;
                    System.out.println("File at index " + i + " are identical");
                }
                else
                {
                    check[i] = false;
                    System.out.println("File at index " + i + " are not identical");
                }
            }
        }
        else
            System.out.println("Files do not match!");
    }

    public static Vector<Vector<File>> align(Vector<File> fileVect1, Vector<File> fileVect2)
    {
        Vector<Vector<File>> mainBuffer = new Vector<Vector<File>>();
        Vector<File> bufferFileVect = new Vector<File>();
        for (File fle1 : fileVect1)
        {
            for (File fle2 : fileVect2)
            {
                if (fle1.getName().equals(fle2.getName()))
                    bufferFileVect.add(fle2);
            }
        }

        mainBuffer.add(fileVect1);
        mainBuffer.add(bufferFileVect);

        return mainBuffer;
    }

    public static Vector<byte[]> getBytes(Vector<File> fileVector) throws IOException
    {
        Vector<byte[]> outVector = new Vector<byte[]>();

        for (File file : fileVector)
        {
            InputStream is = new FileInputStream(file);

            // Get the size of the file
            long length = file.length();

            if (length > Integer.MAX_VALUE)
            {
                System.out.println("File is too large!");
            }

            // Create the byte array to hold the data
            byte[] bytes = new byte[(int) length];

            // Read in the bytes
            int offset = 0;
            int numRead = 0;
            while (offset < bytes.length && (numRead = is.read(bytes, offset, bytes.length - offset)) >= 0)
            {
                offset += numRead;
            }

            // Ensure all the bytes have been read in
            if (offset < bytes.length)
            {
                throw new IOException("Could not completely read file " + file.getName());
            }

            // Close the input stream and return bytes
            outVector.add(bytes);
            is.close();
        }
        return outVector;
    }
}
like image 771
TheWolf Avatar asked Oct 31 '10 03:10

TheWolf


2 Answers

The equals function isn't doing a deep comparison, rather for a byte[] you're comparing addresses. Instead you should use

Arrays.equals(fileByteVect1.elementAt(i), fileByteVect2.elementAt(i))

to perform the deep comparison of the byte arrays.

More detail on Arrays.equals.

As for your third question, you're not actually filtering for just files. When you iterate through to print out the filename you should construct the Vector storing the files:

for (File fle : fileArr1) {
    if (fle.isFile()) {
        fileVec1.add(fle);
        System.out.println("    " + fle.getName());
    }
}

You will, of course, have to do this for fileArr2 and fileVec2 as well.

like image 145
Mark Elliot Avatar answered Oct 31 '22 22:10

Mark Elliot


Simple. The equals(Object) method on an array is inherited from Object, and hence is equivalent to the == operator; i.e. it is just a reference comparison.

This is specified in JLS 6.4.5.

If you want to compare arrays by value, use the java.util.Arrays.equals(array1, array2) methods. There are overloads for arrays of each primitive type and arrays of Object.

(Note that it is the semantics of each element type's implementation of equals method that determines if Arrays.equals(Object[], Object[]) is a "deep" or "shallow" comparison.)

FOLLOW UP

I suspect that the third problem happens because your application is trying to open the subdirectory as a file. That won't work. Instead, you need to:

  1. Use File.isFile() and File.isDirectory() to determine whether you should be reading the directory entries as files or dirctories (or not at all).
  2. For a directory, you should recursively use File.listFiles() or similar to iterate over the subdirectory contents.
like image 22
Stephen C Avatar answered Oct 31 '22 22:10

Stephen C